Production RAG: Handling Edge Cases and Failures

Param Harrison
6 min read

Share this post

In our last post, we learned how to measure our RAG agent's quality. We built a "golden set" and used RAGAs to score our bot.

But this post is for you if you've ever moved a "working" demo to production, only to have it crash 10 minutes later.

In the real world, things break. APIs time out. Third-party services go down. LLMs get rate-limited. If your agent is just a "happy path" script, it's not a production system. It's a liability.

Today, we'll build a Resilient Agent that can handle real-world chaos using fallbacks, retries, and graceful degradation.

The problem: The "Brittle" agent

Our "happy path" agent works great... if the world is perfect. But what happens when our "web search" tool times out?

sequenceDiagram
    participant User
    participant Agent
    participant Web_Search_Tool
    
    User->>Agent: "What's the news on Project-Z?"
    activate Agent
    Agent->>Web_Search_Tool: search("Project-Z")
    activate Web_Search_Tool
    
    note right of Web_Search_Tool: ... (API times out after 30s) ...
    
    Web_Search_Tool-->>Agent: [X ERROR 504: Gateway Timeout]
    deactivate Web_Search_Tool
    
    Agent->>Agent: [CRASH]
    Agent-->>User: {"error": "Internal Server Error"}
    deactivate Agent

Why this is bad:

  • The User gets a broken app. This is the worst possible experience.
  • The Agent is brittle. A single, common network error brought down our entire system.

A production agent must be anti-fragile. It needs a "Plan B."

The solution: A "Graceful Fallback" graph

We can't prevent all errors, but we can handle them. We will stop thinking in "chains" and start thinking in "graphs" with conditional logic.

We will build an agent with this logic:

  1. Try Tool A (e.g., our primary, high-quality paid search API).
  2. Did it work?
    • Yes: Great, go to "Generate Answer."
    • No (Timeout/Error): Don't crash. Go to "Plan B."
  3. Try Tool B (e.g., a free, less reliable web search tool).
  4. Did that work?
    • Yes: Great, go to "Generate Answer" (with the Plan B data).
    • No: Don't crash. Go to "Plan C."
  5. Plan C: Generate a graceful failure message.

This is called Graceful Degradation.

graph TD
    A[Start] --> B(Try Tool A: Paid Search API)
    B --> C{Success?}
    C -- "Yes" --> D[Generate Answer]
    C -- "No (e.g., Timeout)" --> E(Try Tool B: Free Web Search)
    E --> F{Success?}
    F -- "Yes" --> D
    F -- "No (e.g., Timeout)" --> G[Generate Graceful Error: Sorry, I can't search right now.]
    D --> H[End]
    G --> H
    
    style B fill:#e3f2fd,stroke:#0d47a1
    style E fill:#fff8e1,stroke:#f57f17
    style G fill:#ffebee,stroke:#b71c1c
    style D fill:#e8f5e9,stroke:#388e3c

The "How": Building fallbacks with LangGraph

We can build this exact logic using LangGraph. We'll define our "state" and our "nodes," but this time, our nodes will include try/except blocks.

Brick 1: The "Memory" (GraphState)

Our "memory" needs to hold the question and a context that might be filled by either Tool A or Tool B.

from typing import TypedDict, List

class GraphState(TypedDict):
    question: str
    context: List[str]
    error_message: str # To store what went wrong

Brick 2: The "Nodes" (with error handling)

Now, we build our nodes. This time, they don't just "run"; they "try to run."

# Our (fictional) tools
def paid_search_tool(query: str) -> List[str]:
    # This tool is great, but it might fail
    if "fail" in query: # A mock failure
        raise TimeoutError("API timed out after 30 seconds")
    return ["Fact from Paid API: ..."]

def free_search_tool(query: str) -> List[str]:
    # This is our cheap, reliable fallback
    return ["Fact from Free Search: ..."]

# --- Node 1: Try Tool A ---
def try_tool_a(state):
    print("---NODE: Trying Tool A (Paid Search)---")
    try:
        context = paid_search_tool(state["question"])
        return {"context": context, "error_message": None}
    except Exception as e:
        print(f"Tool A failed: {e}")
        return {"context": [], "error_message": str(e)}

# --- Node 2: Try Tool B (The Fallback) ---
def try_tool_b(state):
    print("---NODE: Trying Tool B (Free Search)---")
    # Our simple fallback tool is very reliable
    context = free_search_tool(state["question"])
    return {"context": context, "error_message": None}

# --- Node 3: The Final "Safety Net" ---
def generate_error_message(state):
    print("---NODE: All tools failed. Gracefully failing.---")
    return {"context": [f"I'm sorry, my search tools are currently offline. The error was: {state['error_message']}"]}

Observation: Our nodes are now "smart." They catch errors and update the GraphState instead of crashing the program.

Brick 3: The "Wires" (The conditional logic)

Now, we wire it all up in our LangGraph workflow.

from langgraph.graph import StateGraph, END

# --- The "Decision" Edges ---
def check_tool_a_success(state):
    # Did the first tool work?
    if state["error_message"] is None:
        return "generate" # Yes, go straight to the answer
    else:
        return "try_tool_b" # No, trigger Plan B

def check_tool_b_success(state):
    # (In a real app, we'd check again, but for this demo
    # we'll assume Tool B always works or we'll fail)
    if state["context"]:
        return "generate"
    else:
        return "fail_gracefully"

# --- Build the Graph ---
workflow = StateGraph(GraphState)
workflow.add_node("try_tool_a", try_tool_a)
workflow.add_node("try_tool_b", try_tool_b)
workflow.add_node("fail_gracefully", generate_error_message)
workflow.add_node("generate", ...) # Our final LLM generator node

# --- Set the Logic Flow ---
workflow.set_entry_point("try_tool_a")

# The first critical decision
workflow.add_conditional_edges(
    "try_tool_a",
    check_tool_a_success,
    {
        "generate": "generate",
        "try_tool_b": "try_tool_b"
    }
)

# The second critical decision
workflow.add_conditional_edges(
    "try_tool_b",
    check_tool_b_success,
    {
        "generate": "generate",
        "fail_gracefully": "fail_gracefully"
    }
)

# The final paths
workflow.add_edge("generate", END)
workflow.add_edge("fail_gracefully", "generate") # We still go to 'generate' to show the user the error

app = workflow.compile()

Result: We've built a resilient agent!

  • If we send {"question": "What is Model-V?"}, it follows try_tool_a -> generate.
  • If we send {"question": "fail this query"}, it follows try_tool_a -> (Fails) -> try_tool_b -> generate.

Our bot no longer crashes. It degrades gracefully.

Challenge for you

  1. Use Case: Our current logic retries any error.
  2. The Problem: What if try_tool_a fails with a 401 Unauthorized (Bad API Key) error? Retrying with try_tool_b is a waste of time and money; the real problem is our key.
  3. Your Task: How would you modify the check_tool_a_success logic to not retry on a 401? (Hint: The function can return more than two strings. What if it returned "fail_fast" and you added a new node for that?)

Key takeaways

  • Production systems need error handling: Happy path code will fail in production—you must handle timeouts, rate limits, and service outages
  • Fallback strategies prevent crashes: When Tool A fails, gracefully try Tool B instead of crashing
  • Graceful degradation maintains UX: Even when tools fail, provide a helpful error message instead of a generic 500 error
  • Conditional edges enable resilience: LangGraph's conditional edges let you route based on success/failure, creating self-healing agents
  • Error state is part of state: Store error messages in your GraphState so downstream nodes can make informed decisions

For more on building resilient systems, see our concurrency and resilience guide.


For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.