Multi-Hop RAG: When One Retrieval Isn't Enough

Param Harrison
6 min read

Share this post

In our last post, we built a resilient agent that can handle tool failures. It's robust, but it's still "dumb." It can only answer simple, one-step questions.

This post is for you if you've ever built a RAG system and watched it fail this simple query:

"What is the competitor to the product mentioned in our latest press release?"

A simple RAG system will fail this query 100% of the time. It's a two-part question, and the agent can't figure it out.

Today, we'll build a Multi-Hop RAG Agent that can "think" in multiple steps to solve complex research questions.

The problem: The "One-Step" retriever

Let's trace the failure of our simple RAG bot.

User Query: "What is the competitor to the product mentioned in our latest press release?"

sequenceDiagram
    participant User
    participant Agent
    participant VectorDB
    
    User->>Agent: "Competitor to product in press release?"
    activate Agent
    
    Agent->>VectorDB: "Find docs about 'competitor' + 'press release'"
    activate VectorDB
    VectorDB-->>Agent: "Here is the 'Latest Press Release' doc."
    deactivate VectorDB
    
    Agent->>Agent: **Reads Doc:** "Our new product, 'Model-V', is launching."
    Agent-->>User: [X "The press release mentions 'Model-V', but does not mention any competitors."]
    deactivate Agent

Why this is bad:

  • The agent found the right first document (the press release).
  • But it stopped. It didn't "understand" that the query was a 2-step process:
    1. Hop 1: Find the press release to identify the product (Answer: "Model-V").
    2. Hop 2: Start a new search for "competitors of Model-V."

The solution: A "Query Decomposition" graph

We need our agent to stop trying to answer in one shot. We need it to Decompose the query.

We'll build an agent (using LangGraph) that can use an LLM to generate new questions for itself.

graph TD
    A["User Query: 'Competitor to product in press release?'"] --> B(Hop 1: RAG)
    B -- "Context: '...our new product, Model-V...'" --> C(LLM: Decompose & Generate)
    C -- "Sub-Query: 'Who is the competitor to Model-V?'" --> D(Hop 2: RAG / Web Search)
    D -- "Context: 'Cognito Inc. is the main competitor...'" --> C
    C --> E["Final Answer: 'The press release mentions 'Model-V'. Its main competitor is Cognito Inc.'"]
    
    style B fill:#e3f2fd,stroke:#0d47a1
    style C fill:#e8f5e9,stroke:#388e3c
    style D fill:#e3f2fd,stroke:#0d47a1
    style E fill:#e8f5e9,stroke:#388e3c

This is a Multi-Hop agent. It uses the output of one retrieval step as the input for the next retrieval step.

The "How": Building a multi-hop graph

We'll define a LangGraph GraphState that can hold a list of sub-questions and expand itself.

Brick 1: The "Memory" (GraphState)

Our "memory" needs to be a list of questions to solve, and a list of facts we've found.

from typing import TypedDict, List
from langgraph.graph import StateGraph, END

class GraphState(TypedDict):
    original_query: str
    questions: List[str]  # The "to-do" list of questions
    answers: List[str]    # The "done" list of facts

Brick 2: The "Planner" node

This is our new "brain." Its job is to take the user's query and decompose it into a step-by-step plan.

from openai import OpenAI
import json

llm_client = OpenAI()

# --- Node 1: The Planner (Query Decomposer) ---
def plan_queries(state):
    print("---NODE: PLAN_QUERIES---")
    
    prompt = f"""You are a research assistant.
    Break down the following complex question into a series of 
    simple, searchable sub-questions.
    
    Question: {state['original_query']}
    
    Return a JSON list of strings.
    Example: ["question 1", "question 2"]
    """
    
    response = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )
    
    sub_questions = json.loads(response.choices[0].message.content)
    
    return {"questions": sub_questions, "answers": []}

Observation: If we send "What is the competitor to the product in the latest press release?", this node will output a list like:

["What is the latest press release?", "Who is the competitor to the product mentioned?"]

Brick 3: The "Executor" node (Our RAG tool)

This node's job is to take one question from the "to-do" list, run our simple RAG pipeline on it, and add the fact to our "done" list.

# This is our simple RAG function from Post 1
def simple_rag_pipeline(query: str) -> str:
    # (Code to retrieve from vector store and generate an answer)
    return "The latest press release mentions 'Model-V'." # Mocked answer

# --- Node 2: The Executor (Runs one sub-question) ---
def execute_search(state):
    print("---NODE: EXECUTE_SEARCH---")
    # Get the *first* question from the "to-do" list
    question = state["questions"][0]
    
    # "Pop" it off the to-do list
    remaining_questions = state["questions"][1:]
    
    # Run RAG on just that one question
    answer = simple_rag_pipeline(question)
    
    # Add the answer to our "done" list
    new_answers = state["answers"] + [answer]
    
    return {"questions": remaining_questions, "answers": new_answers}

Brick 4: The "Wires" (The loop)

Now we wire it all together.

# --- The "Decision" Edge ---
def check_for_more_questions(state):
    # This is our loop condition
    if len(state["questions"]) > 0:
        return "continue" # Go back to the executor
    else:
        return "end" # All done, go to the final answer

# --- Build the Graph ---
workflow = StateGraph(GraphState)
workflow.add_node("plan_queries", plan_queries)
workflow.add_node("execute_search", execute_search)
workflow.add_node("generate_final_answer", ...) # The final LLM call
workflow.set_entry_point("plan_queries")
workflow.add_edge("plan_queries", "execute_search")

# This is our Multi-Hop Loop!
workflow.add_conditional_edges(
    "execute_search",
    check_for_more_questions,
    {
        "continue": "execute_search", # Loop back to itself!
        "end": "generate_final_answer"
    }
)

workflow.add_edge("generate_final_answer", END)
app = workflow.compile()

Result: We've built an agent that can reason. It will "plan" its work, then "execute" that plan one step at a time, looping over the execute_search node until its "to-do" list is empty.

Challenge for you

  1. Use Case: Our execute_search node is simple. It always uses our simple_rag_pipeline.
  2. The Problem: What if a sub-question is "What is the CEO's name?" (internal data), but another is "What is our competitor's stock price?" (external data)?
  3. Your Task: How would you combine this post with our previous post? How could you add a Router inside the execute_search node to decide which tool (internal RAG vs. web search) to use for each sub-question?

Key takeaways

  • Multi-hop queries require decomposition: Complex questions need to be broken into simpler sub-questions that can be answered sequentially
  • State management enables iteration: Using a list of questions and answers in GraphState allows the agent to track progress through multiple retrieval steps
  • The planner generates the roadmap: An LLM-based planner node decomposes complex queries into a series of searchable sub-questions
  • The executor runs one step at a time: Each iteration of the executor answers one sub-question and updates the state
  • Loops enable multi-step reasoning: Conditional edges that loop back to the executor create the multi-hop retrieval pattern

For more on advanced RAG patterns, see our advanced RAG guide and our agent framework comparison.


For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.