Giving Your LLM Hands: A Deep Dive on Tool Calling

Param Harrison
7 min read

Share this post

Welcome back to our AI engineering series. In our first projects, we've treated LLMs as brilliant "brains in a jar." They can talk, they can reason, they can even write code. But they are isolated. They can't check the weather, they can't search your database, and they can't book a flight.

This post is for you if you're ready to build a true "agent" that can act.

Today, we'll build the "nervous system" that connects the AI's "brain" to its "hands" (your code). This is the fundamental, must-know technique called Tool Calling (or Function Calling).

The problem: The "Closed-Book" AI

Let's start with a simple, common failure.

Use Case: "What's the weather in San Francisco right now?"

graph TD
    A["User: 'What's the weather in SF?'"] --> B(LLM)
    B --> C["Bot: 'I'm sorry, I am an AI and do not have access to real-time information. My knowledge is limited to what I was trained on (up to 2023).'"]
    
    style C fill:#ffebee,stroke:#b71c1c,color:#212121

Why this is bad: The LLM is brilliant, but it's useless for any real-world task. To solve this, we must give it a "tool"—a function in our code that it can ask us to run.

The solution: The "Tool Calling" loop

The core idea is simple: the LLM doesn't run the tool. It asks our application to run the tool for it.

This creates a "loop" where our application is in charge.

sequenceDiagram
    participant User
    participant Your_App
    participant LLM
    participant Tool_API
    
    User->>Your_App: "What's the weather in London?"
    activate Your_App
    
    Your_App->>LLM: "User asked: '...weather in London?'"
    activate LLM
    LLM-->>Your_App: Call Tool: `get_weather(location="London")`
    deactivate LLM
    
    Your_App->>Tool_API: Run get_weather("London")
    activate Tool_API
    Tool_API-->>Your_App: "15°C and cloudy"
    deactivate Tool_API
    
    Your_App->>LLM: "Tool Result: 15°C and cloudy"
    activate LLM
    LLM-->>Your_App: "The weather in London is 15°C and cloudy."
    deactivate LLM
    
    Your_App-->>User: "The weather in London is 15°C and cloudy."
    deactivate Your_App

Let's build this, step-by-step.

How to build a Tool-Calling agent

We'll build this using Python, FastAPI, and Pydantic for the data "contract."

Step 1: Define your tool's "Contract" (Pydantic)

First, we need a "contract" that tells the LLM exactly what our tool looks like. What is it called? What arguments does it need? Pydantic is the perfect way to define this.

# main.py

from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    """
    This is our "contract." It defines the inputs
    for our get_weather function.
    """
    city: str = Field(..., description="The city, e.g., 'San Francisco' or 'Tokyo'")
    unit: str = Field("celsius", description="The unit for temperature, 'celsius' or 'fahrenheit'")

Observation: We've defined a clear schema. The description fields are critical—they are the "prompt" that tells the LLM how to use the field.

Step 2: Create the tool definition for the LLM

Next, we convert this Pydantic model into the JSON schema format that the OpenAI API expects.

# main.py (continued)

from openai import OpenAI

client = OpenAI() # Assumes OPENAI_API_KEY is set

# This is the "menu" of tools we will show the LLM
tools_list = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Gets the real-time weather for a location.",
            
            # Pydantic automatically generates the JSON schema for us!
            "parameters": GetWeather.model_json_schema()
        }
    }
]

Observation: GetWeather.model_json_schema() is a huge time-saver. It converts our Python class into the exact JSON format the LLM needs.

Step 3: The first LLM call (The "Request")

Now, we call the LLM. We pass our user's query and the tools_list we just created. We set tool_choice="auto" to let the LLM decide if it needs a tool.

# main.py (continued)

messages = [{"role": "user", "content": "What's the weather in Paris in fahrenheit?"}]

# --- This is the first API call ---
print("1. Sending request to LLM...")
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=tools_list,
    tool_choice="auto" 
)

response_message = response.choices[0].message
messages.append(response_message) # Save this message for context

Observation: The LLM does not respond with, "I'm sorry, I can't..." Instead, response_message now contains a tool_calls object.

# This is what 'response_message' looks like:
{
  "role": "assistant",
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": "{\"city\": \"Paris\", \"unit\": \"fahrenheit\"}"
      }
    }
  ]
}

The LLM has correctly parsed the user's query ("Paris", "fahrenheit") and formatted it to match our Pydantic schema.

Step 4: The application's "Logic Loop" (The "Action")

Our application now sees the tool_calls request. It's our job to run the actual Python function.

# main.py (continued)

import json

# This is our "real" function that does the work
def get_weather(city: str, unit: str = "celsius"):
    """A mock function to get weather."""
    print(f"--- TOOL: Running get_weather(city={city}, unit={unit}) ---")
    return {"temp": 65, "condition": "cloudy", "unit": "fahrenheit"}

# --- This is our logic loop ---
if response_message.tool_calls:
    tool_call = response_message.tool_calls[0]
    
    if tool_call.function.name == "get_weather":
        # 1. Parse the arguments the LLM provided
        args = json.loads(tool_call.function.arguments)
        
        # 2. Run our *real* Python function
        tool_result = get_weather(
            city=args.get("city"),
            unit=args.get("unit")
        )
        
        # 3. Append the tool's result to our message history
        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "name": "get_weather",
                "content": json.dumps(tool_result) # Convert result to a string
            }
        )

Observation: This is the most critical part. The LLM is the "router," but our application is the "executor." This separation is key.

Step 5: The second LLM call (The "Synthesis")

Now our messages list contains the full conversation:

  1. User: "What's the weather in Paris in fahrenheit?"
  2. Assistant: "I'll call the get_weather tool."
  3. Tool: "The result is {'temp': 65, ...}."

We send this entire conversation back to the LLM to get a final, natural-language answer.

# main.py (continued)

# --- This is the second API call ---
print("4. Sending tool results back to LLM for synthesis...")
final_response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages # Send the whole conversation
)

print("\n--- FINAL ANSWER ---")
print(final_response.choices[0].message.content)

Result:

--- FINAL ANSWER ---
The weather in Paris is currently 65°F and cloudy.

We've successfully built an agent that can access real-time data!

The engineer's choice: When to use tool calling

  • Philosophy: "The LLM is a smart router."
  • Best For: Simple, "one-shot" tasks that can be answered with a single action (or a predictable chain of actions).
  • Strengths: Simple to implement, low latency for simple queries, and built into all major LLM APIs (OpenAI, Anthropic, Gemini).
  • Weakness: It's stateless. It can't handle complex logic, dependencies, or loops. It failed our "Travel Bot" problem ("if flight < $300, then book hotel") because that requires an external, stateful "machine" to control the logic.

For more complex scenarios, see our function calling vs. MCP comparison and our ReAct agents guide.

Challenge for you

  1. Use Case: You want to build a simple "E-commerce Bot."
  2. Your Task: Define a new Pydantic model called SearchProduct with a query: str and category: Optional[str] field.
  3. Test It: Add this new tool to your tools_list (so the LLM has two tools: get_weather and search_product).
  4. Observe: Ask the agent, "What's the weather in NYC and do you have any red shoes?" Does the LLM correctly call both tools in a single turn? (Hint: Modern LLMs can!)

Key takeaways

  • Tool calling connects LLMs to the real world: By defining tools with Pydantic schemas, we give LLMs the ability to interact with external systems
  • The pattern is a two-step loop: First, the LLM requests a tool call; then, after execution, it synthesizes the result into a natural answer
  • Pydantic simplifies schema definition: Using model_json_schema() automatically generates the JSON schema that LLMs expect
  • Tool calling is stateless and simple: Perfect for straightforward, one-shot interactions, but limited for complex, stateful workflows
  • Separation of concerns is critical: The LLM routes requests; your application executes the actual logic

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.