Advanced RAG with Tools: Using LlamaIndex

Param Harrison
5 min read

Share this post

In our last lessons, we built a RAG system from the ground up. We had to manage every single step: loading, chunking (see our chunking guide), embedding, and storing (see our vector databases guide).

This was a great way to learn, but it's like building a car by mining the iron ore yourself. For real-world projects, it's smarter to start with a pre-built engine and chassis.

This is the key insight: Frameworks like LlamaIndex act as a powerful toolkit for RAG. They provide pre-built, optimized components for everything, letting us build sophisticated systems much faster.

Part 1: The easy button — simple PDF ingestion

Remember all the steps we took to manually chunk, embed, and store our text? LlamaIndex handles complex files like PDFs in just two lines.

It bundles a PDF loader, a smart text splitter, an embedding model, and a vector store into one simple command.

# LlamaIndex abstracts away all the complexity
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

# 1. Load all documents from a folder (it handles PDFs, .txt, etc.)
documents = SimpleDirectoryReader("./data").load_data()

# 2. Build the index. This automatically handles:
#    - Chunking
#    - Embedding
#    - Storing in a vector index
book_index = VectorStoreIndex.from_documents(documents)

# 3. Get a query engine and ask a question
query_engine = book_index.as_query_engine()
response = query_engine.query("What is the main character's goal?")

What took us dozens of lines before is now done in two. This abstraction is the primary power of using a framework.

Part 2: Reasoning over multiple books

What if the answer isn't in one document? What if it's spread across two different data sources?

For example, you have:

  1. Book Index: The full text of a novel.
  2. Quotes Index: A separate document with just famous quotes.

A user asks: "What is a famous quote by Dumbledore about dreams, and what is the context of dreams in the book?"

A simple RAG system would fail. We need an engine that can:

  1. Break the question into sub-questions.
  2. Route each sub-question to the correct tool (the right "book").
  3. Combine the answers.

LlamaIndex calls this a Sub-Question Query Engine.

graph TD
    A["Complex Query: 'Plot? and Quote?'"] --> B(Sub-Question Engine)
    B --> C["Sub-Question 1: 'Plot?'"]
    B --> D["Sub-Question 2: 'Quote?'"]
    C --> E["Tool 1: Book Index"]
    D --> F["Tool 2: Quotes Index"]
    E & F --> G(Synthesize Final Answer)

We give each of our indexes a "tool" wrapper with a clear description:

# 1. Create a tool for each of our "books"
book_tool = QueryEngineTool.from_defaults(
    query_engine=book_query_engine,
    name="Book_Content",
    description="Useful for answering questions about the plot, characters, and events in the book."
)

quotes_tool = QueryEngineTool.from_defaults(
    query_engine=quotes_query_engine,
    name="Famous_Quotes",
    description="Useful for finding specific quotes from Albus Dumbledore."
)

# 2. Build the Sub-Question Engine
# This engine uses the tool descriptions to route questions
sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[book_tool, quotes_tool]
)

# 3. Ask the complex question
response = sub_question_engine.query(
    "What is a Dumbledore quote about dreams, and what is the context of dreams in the book?"
)

The engine uses the tool descriptions to figure out where to find each piece of the answer. It asks the Famous_Quotes tool for the quote and the Book_Content tool for the context, then combines them.

Part 3: The ReAct Agent — Thinking and Acting

This is the most advanced step. What if the answer isn't in any of our local documents?

We need to give our system a new tool: Web Search.

But now the system has to choose which tool to use. For this, we build a ReAct Agent. "ReAct" stands for a Reason → Act → Observe loop. It allows the LLM to "think" about a plan.

graph TD
    A[User Query] --> B("Reason: 'I need to find X'")
    B --> C("Act: 'I will use Tool Y'")
    C -- "Tool (e.g., Web Search)" --> D("Observe: 'Here's the result'")
    D --> E{Is question answered?}
    E -- No --> B
    E -- Yes --> F[Final Answer]

The agent's "thinking" process looks like this:

  1. Reason: The user is asking for a real-world fact, not in my book.
  2. Act: I will choose the Web_Search tool.
  3. Observe: The web search gives me the fact.
  4. Reason: I now have the answer.
  5. Act: I will generate the final response.

We build this by giving the agent a list of all its available tools:

# 1. Create a new tool for web search
web_search_tool = FunctionTool.from_defaults(
    fn=DuckDuckGoSearchRun().run,
    name="Web_Search",
    description="Useful for searching the web for information not in the local documents."
)

# 2. Create a *new* list of ALL tools
all_tools = [book_tool, quotes_tool, web_search_tool]

# 3. Build the ReAct Agent
agent = ReActAgent.from_tools(all_tools, verbose=True)

# 4. Ask a question that REQUIRES web search
response = agent.chat(
    "What was the real-world historical inspiration for the Philosopher's Stone?"
)

The agent (with verbose=True) will show its "thoughts". It will first check the Book_Content tool, fail to find the answer, and then decide to use the Web_Search tool to find the real-world history of alchemy, all in one "run".

Key takeaways

  • Frameworks accelerate development: Tools like LlamaIndex handle the "plumbing" of RAG (loading, chunking, embedding), letting you focus on building high-level logic
  • Sub-questioning solves multi-doc RAG: The SubQuestionQueryEngine is a powerful way to break down complex questions and get answers from multiple different data sources
  • Agents need tools to be powerful: A RAG system limited to its own documents is a "closed book". By giving an agent tools (like web search), you create an "open-book" system that can answer a much wider range of questions
  • ReAct is a core thinking loop: The Reason → Act → Observe cycle is how you build agents that can reason, make decisions, and use different tools to solve a problem

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.


Take the Next Step

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.

Ready to go deeper?

Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.