Building a Self-Correcting RAG Agent

Param Harrison
6 min read

Share this post

In our last post, we built the foundation for our advanced RAG agent. We defined its "memory" (GraphState) and its "tools" (retrieve, web_search).

But our agent is still just a collection of parts. It has no "brain" to connect them. This post is for you if you're ready to build the logic that makes an agent "smart."

Today, we'll build the three "thinking" nodes of our agent's brain:

  1. The Router: The initial decision-maker.
  2. The Grader: The "self-correction" loop.
  3. The Generator: The final "voice."

The problem: A "Dumb" agent

Without logic, our agent doesn't know what to do. If we ask it the "competitor" question, it doesn't know to use web_search instead of retrieve. If retrieve finds junk, the agent doesn't know it failed.

graph TD
    A[User Query] --> B[Agent Without Logic]
    B --> C[Doesn't know which tool to use]
    B --> D[Doesn't know if results are good]
    C --> E[Random or wrong tool choice]
    D --> F[Accepts bad results]
    E --> G[Poor Answer]
    F --> G
    
    style B fill:#ffebee,stroke:#b71c1c
    style G fill:#ffebee,stroke:#b71c1c

We need to build nodes that can make decisions.

The "How": Building the "Thinking" nodes

These nodes are also simple Python functions, but instead of just fetching data, they use an LLM to reason.

Brick 3: The Router node (The first decision)

This is our agent's "triage" step. Its only job is to look at the user's question and decide where to send it first: to our internal vectorstore or to the public web_search.

from openai import OpenAI

llm_client = OpenAI() # Assumes OPENAI_API_KEY is set

# --- Node 3: The Router ---
def route_query(state):
    print("---NODE: ROUTE_QUERY---")
    question = state["question"]
    
    # We ask an LLM to act as the router
    prompt = [
        {"role": "system", "content": """You are an expert at routing a user question.
Use 'vectorstore' for specific questions about 'Model-V' (its features, architecture, or training data).
Use 'web_search' for all other questions, especially comparisons, competitors, pricing, or recent events."""},
        {"role": "user", "content": f"Given the user question, which datasource should I use? Question: {question}"}
    ]
    
    response = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=prompt,
        temperature=0
    )
    source = response.choices[0].message.content
    print(f"Routing decision: {source}")
    
    # The return value of this node will be the *name* of the *next* node to run
    if "vectorstore" in source.lower():
        return "retrieve"
    else:
        return "web_search"

Observation: We've built a "smart" router. By using an LLM, we don't need to write complex if/else rules. We just tell the LLM (in plain English) how to make the decision.

graph TD
    A[User Query] --> B(Router Node)
    B --> C{Decision}
    C -- "vectorstore" --> D[Retrieve from Internal Docs]
    C -- "web_search" --> E[Search the Web]
    
    style B fill:#e3f2fd,stroke:#0d47a1
    style C fill:#fff8e1,stroke:#f57f17

Brick 4: The Grader node (The self-correction loop)

This is the most important node in our graph. The Grader's job is to check the output of our tools. After the retrieve node runs, this node looks at the retrieved documents and grades them: "Are these documents actually relevant?"

This allows our agent to "realize" it failed.

# --- Node 4: The Grader ---
def grade_documents(state):
    print("---NODE: GRADE_DOCUMENTS---")
    question = state["question"]
    documents = state["documents"]
    
    # We ask an LLM to be the grader
    prompt = [
        {"role": "system", "content": """You are a grader. Your task is to determine if the retrieved documents
are relevant to the user question and contain enough information to answer it completely.
Respond with a single word: 'yes' or 'no'."""},
        {"role": "user", "content": f"Retrieved Documents:\n{documents}\n\nUser Question: {question}"}
    ]
    
    response = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=prompt,
        temperature=0
    )
    decision = response.choices[0].message.content
    print(f"Grader decision: {decision}")
    # This 'yes' or 'no' will control the next conditional edge
    if 'yes' in decision.lower():
        return "yes"
    else:
        return "no"

Observation: This node is our "self-correcting" mechanism. If the retrieve node (our internal search) fails, this node will catch it and output "no," which will allow us to trigger the web_search as a fallback.

graph TD
    A[Retrieved Documents] --> B(Grader Node)
    B --> C{Are docs relevant?}
    C -- "yes" --> D[Proceed to Generate]
    C -- "no" --> E[Try Alternative Tool]
    E --> F[Web Search]
    F --> D
    
    style B fill:#e3f2fd,stroke:#0d47a1
    style C fill:#fff8e1,stroke:#f57f17
    style E fill:#ffebee,stroke:#b71c1c

Brick 5: The Generator node (The final answer)

Finally, once we have good documents (either from retrieve or web_search), we pass them to our Generator node. This node's only job is to synthesize the final answer.

# --- Node 5: The Generator ---
def generate(state):
    print("---NODE: GENERATE---")
    question = state["question"]
    documents = state["documents"]
    context = "\n".join(documents)
    
    # This is our familiar RAG prompt
    prompt = [
        {"role": "system", "content": "You are a helpful assistant. Answer the user's question based on the provided context. Be comprehensive and synthesize information from all provided documents."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion:\n{question}"}
    ]
    
    response = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=prompt,
        temperature=0
    )
    generation = response.choices[0].message.content
    
    # We update the "generation" field in our state
    return {"generation": generation}

We've built all the pieces! We have:

  • Memory: GraphState
  • Tools: retrieve, web_search
  • Logic: route_query, grade_documents
  • Voice: generate

In our next and final post, we'll "wire up" all these nodes in LangGraph and run our fully autonomous, self-correcting agent.

Challenge for you

  1. Use Case: Our grade_documents node is good, but it's a bit simple.
  2. The Problem: What if the documents are relevant but not sufficient (e.g., they only answer half the question)? A "yes/no" grade isn't enough.
  3. Your Task: How would you re-design the grade_documents node? What should it output instead of just "yes" or "no"? (Hint: What if it outputted "complete" vs. "partial" vs. "irrelevant"?)

Key takeaways

  • Routing is the first decision: The Router node uses an LLM to intelligently choose which tool to use first, saving time and tokens
  • Grading enables self-correction: The Grader node is the most critical—it allows the agent to recognize when it has failed and try alternative approaches
  • LLMs make great decision-makers: Instead of complex if/else rules, we use natural language prompts to guide LLM-based routing and grading
  • Generator synthesizes the final answer: Once we have good documents, the Generator node creates a comprehensive, natural-language response
  • All nodes share state: The GraphState allows each node to see what previous nodes found and make decisions accordingly

For more on self-correcting systems, see our advanced RAG guide.


For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.