Agentic RAG: Building a System That Thinks, Acts, and Corrects
Our previous RAG systems were like diligent interns following a rigid checklist:
- Retrieve documents
- Stuff them into a prompt
- Generate an answer
This works well for simple questions. But what happens if the answer isn't in our documents? Or if the question is about a very recent event? The system fails.
Thought Experiment: If you look for a recipe in your cookbook (your static index) but don't find it, you don't just give up. You switch to a different tool—you use Google (web search).
This is the core idea of Agentic RAG. We need to build a system that can assess the situation, choose the right tool for the job, and even check its own work before giving a final answer. It's not a simple assembly line; it's a dynamic workshop.
Part 1: Giving our Agent - Tools
An agent is only as good as its tools. A tool is a function the LLM can decide to call to get new information. We'll give our agent two tools:
- Vector Store Retriever: Our "internal cookbook." This tool searches our private documents, just like before.
- Web Search Tool: Our "Google" fallback. This tool lets the agent search the live internet for new or external information.
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.vectorstores import Chroma
# 1. Create the Web Search Tool
# This tool can search the live internet
web_search_tool = DuckDuckGoSearchRun()
# 2. Create the Vector Store Retriever Tool
# This tool searches our private documents (from previous lessons)
# (Code to load documents and create a Chroma vector store)
vectorstore = Chroma.from_documents(...)
retriever_tool = vectorstore.as_retriever()
Part 2: Building the agent's brain (the graph)
Now for the exciting part. We'll build a graph (a flowchart) to represent our agent's decision-making process. This graph will have a state (its memory) and nodes (the steps).
Our graph will implement a self-correcting loop. Here is the logic:
graph TD
A[Start: Retrieve from Vector Store] --> B(Grade Retrieved Documents)
B -- "Relevant" --> C[Generate Answer]
B -- "Not Relevant" --> D[Search the Web]
D --> C
C --> E[End]
This diagram shows our agent's "brain." It has a decision point—the "Grade Documents" step—that allows it to change its plan and correct itself.
Part 3: The agent's logical steps (the nodes)
Let's look at each node in our graph.
Node 1: Retrieve (the first attempt)
This is our standard RAG step. The agent first tries to find the answer in its internal vector store.
# Node 1: A function to retrieve documents
def retrieve(state):
print("---NODE: RETRIEVE DOCUMENTS---")
question = state["question"]
# Call the retriever tool
documents = retriever_tool.invoke(question)
return {"documents": documents, "question": question}
Node 2: Grade documents (the self-correction loop)
This is the most important new step. The agent uses an LLM to grade its own work. It checks if the retrieved documents are actually relevant to the question.
# Node 2: A function to grade the retrieved documents
def grade_documents(state):
print("---NODE: GRADE DOCUMENTS---")
question = state["question"]
documents = state["documents"]
# Use an LLM to make a binary decision
prompt = """You are a grader. Grade the relevance of a
retrieved document to a user question.
Respond with a simple 'yes' or 'no'.
Document: {document_content}
Question: {question}"""
# (Code to loop through documents and grade them)
if any_document_is_relevant:
return "generate" # Decision: Go to generation
else:
return "web_search" # Decision: Fallback to web search
Node 3: Web search (the fallback)
If the grade_documents node decides all the internal documents are "Not Relevant," the graph routes the flow to this node.
# Node 3: A function to search the web
def web_search(state):
print("---NODE: WEB SEARCH---")
question = state["question"]
# Call the web search tool
web_results = web_search_tool.invoke(question)
# Package results to be compatible with the next step
return {"documents": [Document(page_content=web_results)], "question": question}
Node 4: Generate (the final answer)
This node takes the good documents (either from the initial retrieval or the web search) and generates the final answer.
# Node 4: A function to generate the final answer
def generate(state):
print("---NODE: GENERATE ANSWER---")
question = state["question"]
documents = state["documents"]
# Standard RAG prompt
prompt = "Use the following context to answer the question...\n"
# (Code to call the LLM with the context and question)
return {"generation": generated_answer}
Part 4: The agentic RAG in action
Let's test our new agent "brain" with two different questions.
Test 1: The internal question
Query: "What are the components of an LLM agent?" (This answer is in our documents)
Retrieve: Finds relevant documents about LLM agents.Grade Documents: The LLM grader looks at the docs and says, "These are clearly about LLM agents."- Decision: The grader returns "Relevant".
Generate: The agent generates an answer using the retrieved documents.- Path taken:
Retrieve→Grade→Generate→End
Test 2: The external/new question
Query: "What was the main announcement from OpenAI's Spring Update in May 2024?" (This answer is not in our documents)
Retrieve: Searches the vector store and finds irrelevant documents about old agent architectures.Grade Documents: The LLM grader looks at these old docs and the "May 2024" question. It says, "These documents do not answer the question."- Decision: The grader returns "Not Relevant".
Web Search: The graph routes to the web search tool, which finds articles about the new "GPT-4o" model.Generate: The agent generates an answer using the new context from the web search.- Path taken:
Retrieve→Grade→Web Search→Generate→End
The agent successfully identified its own failure, corrected its course, and delivered a perfect answer.
Key takeaways
- Agentic RAG is not linear: It operates in cycles, making decisions based on the current situation. This is far more robust than a simple
Retrieve -> Generatechain. - Tools are superpowers: Giving an agent "tools" (like web search, calculators, or API access) allows it to overcome the limitations of its static knowledge.
- Self-correction is key: The most powerful pattern is the ability to grade outputs. By checking the relevance of its own work, our agent can decide to take corrective action instead of generating a bad answer.
- Graphs are for agents: While simple chains are fine for simple tasks, libraries like LangGraph are essential for building complex, stateful agents that have loops and conditional logic.
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.