Agentic RAG: Building a System That Thinks, Acts, and Corrects

Param Harrison
6 min read

Share this post

Our previous RAG systems were like diligent interns following a rigid checklist:

  1. Retrieve documents
  2. Stuff them into a prompt
  3. Generate an answer

This works well for simple questions. But what happens if the answer isn't in our documents? Or if the question is about a very recent event? The system fails.

Thought Experiment: If you look for a recipe in your cookbook (your static index) but don't find it, you don't just give up. You switch to a different tool—you use Google (web search).

This is the core idea of Agentic RAG. We need to build a system that can assess the situation, choose the right tool for the job, and even check its own work before giving a final answer. It's not a simple assembly line; it's a dynamic workshop.

Part 1: Giving our Agent - Tools

An agent is only as good as its tools. A tool is a function the LLM can decide to call to get new information. We'll give our agent two tools:

  1. Vector Store Retriever: Our "internal cookbook." This tool searches our private documents, just like before.
  2. Web Search Tool: Our "Google" fallback. This tool lets the agent search the live internet for new or external information.
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.vectorstores import Chroma

# 1. Create the Web Search Tool
# This tool can search the live internet
web_search_tool = DuckDuckGoSearchRun()

# 2. Create the Vector Store Retriever Tool
# This tool searches our private documents (from previous lessons)
# (Code to load documents and create a Chroma vector store)
vectorstore = Chroma.from_documents(...)
retriever_tool = vectorstore.as_retriever()

Part 2: Building the agent's brain (the graph)

Now for the exciting part. We'll build a graph (a flowchart) to represent our agent's decision-making process. This graph will have a state (its memory) and nodes (the steps).

Our graph will implement a self-correcting loop. Here is the logic:

graph TD
    A[Start: Retrieve from Vector Store] --> B(Grade Retrieved Documents)
    B -- "Relevant" --> C[Generate Answer]
    B -- "Not Relevant" --> D[Search the Web]
    D --> C
    C --> E[End]

This diagram shows our agent's "brain." It has a decision point—the "Grade Documents" step—that allows it to change its plan and correct itself.

Part 3: The agent's logical steps (the nodes)

Let's look at each node in our graph.

Node 1: Retrieve (the first attempt)

This is our standard RAG step. The agent first tries to find the answer in its internal vector store.

# Node 1: A function to retrieve documents
def retrieve(state):
    print("---NODE: RETRIEVE DOCUMENTS---")
    question = state["question"]
    
    # Call the retriever tool
    documents = retriever_tool.invoke(question)
    
    return {"documents": documents, "question": question}

Node 2: Grade documents (the self-correction loop)

This is the most important new step. The agent uses an LLM to grade its own work. It checks if the retrieved documents are actually relevant to the question.

# Node 2: A function to grade the retrieved documents
def grade_documents(state):
    print("---NODE: GRADE DOCUMENTS---")
    question = state["question"]
    documents = state["documents"]
    
    # Use an LLM to make a binary decision
    prompt = """You are a grader. Grade the relevance of a 
    retrieved document to a user question. 
    Respond with a simple 'yes' or 'no'.
    
    Document: {document_content}
    Question: {question}"""
    
    # (Code to loop through documents and grade them)
    
    if any_document_is_relevant:
        return "generate"  # Decision: Go to generation
    else:
        return "web_search" # Decision: Fallback to web search

Node 3: Web search (the fallback)

If the grade_documents node decides all the internal documents are "Not Relevant," the graph routes the flow to this node.

# Node 3: A function to search the web
def web_search(state):
    print("---NODE: WEB SEARCH---")
    question = state["question"]
    
    # Call the web search tool
    web_results = web_search_tool.invoke(question)
    
    # Package results to be compatible with the next step
    return {"documents": [Document(page_content=web_results)], "question": question}

Node 4: Generate (the final answer)

This node takes the good documents (either from the initial retrieval or the web search) and generates the final answer.

# Node 4: A function to generate the final answer
def generate(state):
    print("---NODE: GENERATE ANSWER---")
    question = state["question"]
    documents = state["documents"]
    
    # Standard RAG prompt
    prompt = "Use the following context to answer the question...\n"
    
    # (Code to call the LLM with the context and question)
    return {"generation": generated_answer}

Part 4: The agentic RAG in action

Let's test our new agent "brain" with two different questions.

Test 1: The internal question

Query: "What are the components of an LLM agent?" (This answer is in our documents)

  1. Retrieve: Finds relevant documents about LLM agents.
  2. Grade Documents: The LLM grader looks at the docs and says, "These are clearly about LLM agents."
  3. Decision: The grader returns "Relevant".
  4. Generate: The agent generates an answer using the retrieved documents.
  5. Path taken: RetrieveGradeGenerateEnd

Test 2: The external/new question

Query: "What was the main announcement from OpenAI's Spring Update in May 2024?" (This answer is not in our documents)

  1. Retrieve: Searches the vector store and finds irrelevant documents about old agent architectures.
  2. Grade Documents: The LLM grader looks at these old docs and the "May 2024" question. It says, "These documents do not answer the question."
  3. Decision: The grader returns "Not Relevant".
  4. Web Search: The graph routes to the web search tool, which finds articles about the new "GPT-4o" model.
  5. Generate: The agent generates an answer using the new context from the web search.
  6. Path taken: RetrieveGradeWeb SearchGenerateEnd

The agent successfully identified its own failure, corrected its course, and delivered a perfect answer.

Key takeaways

  • Agentic RAG is not linear: It operates in cycles, making decisions based on the current situation. This is far more robust than a simple Retrieve -> Generate chain.
  • Tools are superpowers: Giving an agent "tools" (like web search, calculators, or API access) allows it to overcome the limitations of its static knowledge.
  • Self-correction is key: The most powerful pattern is the ability to grade outputs. By checking the relevance of its own work, our agent can decide to take corrective action instead of generating a bad answer.
  • Graphs are for agents: While simple chains are fine for simple tasks, libraries like LangGraph are essential for building complex, stateful agents that have loops and conditional logic.

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI — Newsletter by Param

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.