The Decoupled Agent: A Guide to MCP
In our last post, we built a "Tool Calling" agent. It's a powerful pattern, but it has a massive, hidden flaw that you'll discover the second you try to scale it.
This post is for you if you've ever looked at your 50-page tools_list JSON and thought, "This is slow, expensive, and impossible to maintain."
Today, we'll solve this "scaling" problem with a different architecture: the Model Context Protocol (MCP). We're going to move from a "static" list of tools to a "dynamic" and decoupled system.
The problem: The "Static" tool menu
Let's look at the "Travel Bot" agent again. To make it work with Tool Calling, we have to tell the LLM every tool we have, every single time we send a message.
The "menu" of tools we send with every single API call looks like this:
# This entire JSON blob is sent WITH EVERY user message
tools_list = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Gets the real-time weather...",
"parameters": { ... } # Schema for 'location'
}
},
{
"type": "function",
"function": {
"name": "book_flight",
"description": "Books a flight.",
"parameters": { ... } # Schema for 'destination', 'date'
}
},
{
"type": "function",
"function": {
"name": "book_hotel",
"description": "Books a hotel.",
"parameters": { ... } # Schema for 'city', 'check_in'
}
},
# ... and 50 more tools ...
]
Why this is a critical engineering problem:
- It's Expensive: This 50-page tool menu eats up our context window! We are paying thousands of input tokens on every single message, just to remind the LLM what tools it has.
- It's Static: The LLM only knows about the tools we hardcode in this list.
- It's Monolithic: If we want to add a new tool (like
rent_car), we have to update and redeploy our entire agent's code.
graph TD
A[User Query] --> B(LLM)
B -- "Here is my 1-line query AND my 50-page tool menu" --> C[LLM API]
C -- "OK, I'll call one tool from that giant menu" --> B
style B fill:#ffebee,stroke:#b71c1c,color:#212121
style C fill:#ffebee,stroke:#b71c1c,color:#212121
The solution: "Dynamic" tools (Model Context Protocol)
MCP (Model Context Protocol), developed by Anthropic, solves this by completely changing the architecture.
Instead of stuffing our tools into the prompt, MCP creates a Client-Server model.
- The LLM/Agent is the Client.
- Our Tools live on one or more separate MCP Servers.
Now, the LLM can have a conversation with our tools. It can "ask" the server, "What tools do you have?" and the server can "reply" with a list. This is called Dynamic Tool Discovery.
This is the "USB-C" analogy: instead of 10 different, hardcoded ports, we have one standard port (MCP) that any tool (server) can plug into.
graph TD
A[AI Agent MCP Client] <-->|JSON-RPC 2.0| B(MCP Server 1 Weather Tool)
A <--> C(MCP Server 2 Flight Tool)
A <--> D(MCP Server 3 Internal Wiki Tool)
style B fill:#e8f5e9,stroke:#388e3c
style C fill:#e8f5e9,stroke:#388e3c
style D fill:#e8f5e9,stroke:#388e3c
Observation: This is a decoupled architecture. We can add, remove, or update our Flight Tool server (Server 2) without ever touching our agent's code. The agent will simply "discover" the new tool the next time it asks.
Let's build an MCP server (The "How")
We're going to build a simple MCP server that provides one tool: get_tech_news. This tool will scrape the front page of a tech website.
Step 1: Project setup (with uv)
The MCP team (and many modern Python projects) recommends using uv, a super-fast new package manager. Let's use it.
# 1. Install uv (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Create your project
uv init mcp-news-server
cd mcp-news-server
# 3. Create and activate a virtual environment
uv venv
source .venv/bin/activate # (use .venv\Scripts\activate on Windows)
# 4. Install our libraries
# mcp[cli]: The MCP server framework
# httpx: For making async web requests
# bs4: BeautifulSoup, for web scraping
uv add "mcp[cli]" httpx bs4
Step 2: The server code (main.py)
Now, let's write our server. We'll use FastMCP (the FastAPI-based version) and create one tool.
# main.py
from mcp.server.fastmcp import FastMCP
import httpx
from bs4 import BeautifulSoup
import json
# 1. Initialize our MCP server
mcp = FastMCP("tech_news_server")
# 2. This is our web scraping logic
async def fetch_news_summary(url: str):
"""Pulls the first 5 paragraphs from a website."""
async with httpx.AsyncClient() as client:
try:
response = await client.get(url, timeout=10.0)
soup = BeautifulSoup(response.text, "html.parser")
paragraphs = soup.find_all("p")
# Get text from the first 5 paragraphs
text = " ".join([p.get_text() for p in paragraphs[:5]])
return text
except httpx.TimeoutException:
return "Error: The request timed out."
# 3. This is the "magic" - we define our tool
@mcp.tool()
async def get_tech_news(source: str):
"""
Fetches a summary of the latest news from a tech source.
Args:
source (str): The name of the news source. Currently supports: "arstechnica".
"""
if source.lower() == "arstechnica":
news_text = await fetch_news_summary("https://arstechnica.com")
return news_text
else:
return f"Error: Source '{source}' is not supported."
# 4. This is how we run the server
if __name__ == "__main__":
# "stdio" means the server will communicate over
# standard input/output, not over HTTP.
# This is how local clients like Claude Desktop talk to it.
mcp.run(transport="stdio")
Observation: Look at the @mcp.tool() decorator. That's all it takes to expose our get_tech_news function as a tool. The MCP server automatically uses the function's docstring and arguments (source: str) to create the "schema" that it shows to the LLM.
Step 3: Connecting a client (Claude Desktop)
So, how does an agent use this server? The easiest way to test this is with an MCP-compatible client, like the Claude Desktop app.
We just need to tell Claude how to run our server.
- In Claude Desktop, go to Settings > Developer Settings.
- Click Edit Config. This opens a
claude_desktop_config.jsonfile. - Add our server to the
mcpServerslist.
The "How":
{
"mcpServers": {
"mcp-news-server": {
"command": "/path/to/your/.local/bin/uv",
"args": [
"--directory",
"/path/to/your/mcp-news-server",
"run",
"main.py"
]
}
}
}
(Note: You must replace the paths with the full paths to your uv executable and your project directory.)
Step 4: The final result (Dynamic tool use)
Now, we restart Claude. We don't need to change any prompts. We just start a new chat.
- A small "hammer" icon appears in the chat box, showing that Claude (the Client) has successfully connected to our MCP Server.
- When we click it, Claude shows us our
get_tech_newstool, which it dynamically discovered from our server.
graph TD
A[Claude Desktop starts] --> B(Reads claude_desktop_config.json)
B --> C(Launches our main.py server)
C --> D(Claude Client connects to our Server)
D --> E(Claude asks: What tools do you have?)
E --> F(Our Server replies: I have get_tech_news)
F --> G[Hammer icon appears in Claude UI]
Now, let's try to use it.
sequenceDiagram
participant User
participant Claude_Client
participant Our_MCP_Server
User->>Claude_Client: "What's the latest news on Ars Technica?"
activate Claude_Client
Claude_Client->>Claude_Client: "My brain LLM sees that this question matches the get_tech_news tool."
Claude_Client->>User: (Asks for permission to use the tool)
User->>Claude_Client: "Approve"
Claude_Client->>Our_MCP_Server: "Call: get_tech_news(source='arstechnica')"
activate Our_MCP_Server
Our_MCP_Server->>Our_MCP_Server: (Runs our Python function, scrapes the website)
Our_MCP_Server-->>Claude_Client: "Return: '...summary of news...'"
deactivate Our_MCP_Server
Claude_Client->>Claude_Client: "My brain LLM now has the context. I will formulate a final answer."
Claude_Client-->>User: "Here is the latest news from Ars Technica: ..."
deactivate Claude_Client
Observation: We have successfully built a decoupled agent. Our agent's "brain" (Claude) is totally separate from its "tool" (our Python server). We can now update our main.py server to add 10 new tools, and the agent will discover them all without any changes to its code.
Head-to-head comparison
| Aspect | Function Calling (Tool Calling) | MCP (Model Context Protocol) |
|---|---|---|
| Architecture | Static & Monolithic | Dynamic & Decoupled |
| Tool Discovery | Static. Tools are hardcoded in the prompt. | Dynamic. Client "discovers" tools from servers. |
| Scalability | Poor. Fails with 1000s of tools. | Excellent. Can connect to many servers. |
| Flexibility | Low. To change a tool, you must edit the agent's prompt. | High. To change a tool, you just update the tool server. |
| Best For | Simple agents with 1-10 fixed tools. | Enterprise-scale systems with many, evolving tools. |
For a deeper comparison, see our function calling vs. MCP guide.
Challenge for you
- Use Case: You are building an agent for your company, "BigCorp."
- The Goal: The agent needs to talk to 3 different departments:
Sales_API(to get customer data)Support_API(to get support tickets)Engineering_API(to check system status)
- Your Task: Using the MCP pattern, what would your architecture look like? How many MCP servers would you build, and why? How would this be better than putting all three tools in a single "Function Calling" prompt?
Key takeaways
- MCP enables dynamic tool discovery: Unlike function calling, MCP allows agents to discover tools at runtime, eliminating the need to hardcode tool schemas in prompts
- Decoupled architecture scales better: By separating tools into independent servers, you can add, update, or remove tools without modifying agent code
- MCP reduces token costs: Instead of sending a 50-page tool menu with every request, agents query servers only when needed
- The client-server model is flexible: Multiple agents can connect to the same MCP servers, enabling tool sharing across your organization
- MCP is ideal for enterprise systems: When you have many tools that change frequently, MCP's dynamic discovery makes maintenance manageable
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.