Building an Agent's Brain with LangGraph
In our last post, we proved that a simple, linear RAG pipeline is "brittle." It fails when a user's question requires information from outside its static knowledge base.
To fix this, we need to build a "smarter" system—an agent that can make decisions. Instead of a simple checklist, we'll build a "state machine" or a "graph" that can:
- Look at a question.
- Decide which tool to use (our vector store OR a web search).
- Check if the tool's output was any good.
- Loop back and try a different tool if it failed.
Today, we'll build the "blueprint" for this agent using a powerful library called LangGraph.
The problem: A linear chain isn't enough
Our old pipeline was a simple chain.
graph TD
A[Retrieve] --> B[Generate] --> C[Answer]
This is inflexible. If the Retrieve step fails, the whole chain fails.
The solution: A "Cyclic" graph
We need a graph that can loop, branch, and make decisions. This is our new blueprint:
graph TD
A[User Query] --> B(Route to Tool)
B -- "Internal Query" --> C[Retrieve from Vector Store]
B -- "External Query" --> D[Search the Web]
C --> E(Grade Documents)
D --> E
E -- "Good Docs" --> F[Generate Answer]
E -- "Bad Docs" --> D
F --> G[Final Answer]
style B fill:#e3f2fd,stroke:#0d47a1
style E fill:#e3f2fd,stroke:#0d47a1
This is a state machine. Route to Tool and Grade Documents are "conditional edges" (decision points). Bad Docs --> Web Search is a "cycle" (our self-correcting loop).
To build this, we'll use LangGraph.
The "How": Building the agent's components
LangGraph works by defining "nodes" (the steps) and "edges" (the connections). First, let's build our "nodes."
Brick 1: The "Memory" (The GraphState)
Before we build the "nodes," we need to define our agent's "memory." A GraphState is a simple Python object (a TypedDict) that gets passed from node to node. Every node can read from and write to this "memory."
from typing import List, TypedDict
# This is the "memory" of our agent.
# Every node will have access to this.
class GraphState(TypedDict):
question: str # The user's query
documents: List[str] # The retrieved documents
generation: str # The final answer
Observation: This is the most important concept. By defining a shared "state," our generate node can see what the retrieve node found.
Brick 2: The "Tools" (Our nodes)
Now, we define our "tools" as plain Python functions. Each function takes the current state as input and returns a dictionary to update that state.
We need two tools to fetch information:
retrieve: Searches our internal ChromaDB (from Post 1).web_search: Searches the public internet.
The "How":
from langchain_community.tools import DuckDuckGoSearchRun
import chromadb
# Initialize our tools
search_tool = DuckDuckGoSearchRun()
chroma_client = chromadb.Client()
collection = chroma_client.get_collection(name="product_docs") # Get our DB from Post 1
# --- Node 1: The Internal Retriever ---
def retrieve(state):
print("---NODE: RETRIEVE---")
question = state["question"]
# Retrieve from our internal vector store
documents = collection.query(
query_texts=[question],
n_results=3
)['documents'][0]
return {"documents": documents, "question": question}
# --- Node 2: The Web Searcher ---
def web_search(state):
print("---NODE: WEB_SEARCH---")
question = state["question"]
# Call the DuckDuckGo tool
search_result = search_tool.run(question)
# We wrap the single string in a list to match our state's type
documents = [search_result]
return {"documents": documents, "question": question}
Observation: We've built the "hands" of our agent. It now has two different ways to find information. But how does it know which one to use? And how does it generate the final answer?
Think About It: Our
web_searchnode is a bit "dumb"—it just returns one long string from DuckDuckGo. How could we make this node "smarter"? (Hint: What if it retrieved multiple search results and "chunked" them?)
Next step
We've built our agent's "memory" (GraphState) and its "tools" (retrieve, web_search).
In our next post, we'll build the "brain" itself:
- The
Router: The decision node that chooses which tool to use first. - The
Grader: The "self-correction" node that grades the results. - The
Generator: The node that writes the final answer.
Key takeaways
- Linear chains are brittle: When one step fails, the entire pipeline fails—we need graphs that can branch and loop
- State management is critical:
GraphStateallows nodes to share information and make decisions based on previous steps - Tools are just functions: Each tool is a simple Python function that takes state and returns updated state
- LangGraph enables dynamic flow: Unlike linear chains, graphs can have conditional edges and cycles for self-correction
- Foundation before logic: We build the tools first, then add the decision-making nodes that connect them
For more on LangGraph and agent frameworks, see our agent framework comparison.
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.