Workflow Orchestration: Building State Machines with LangGraph

Param Harrison
6 min read

Share this post

In the previous posts, we discussed agents as abstract concepts (see our multi-agent coordination guide). Now, let's talk about the code. How do you actually build a system that loops, pauses, branches, and remembers?

If you are building simple, one-way pipelines (Input -> Prompt -> Output), standard tools (like LangChain chains) are enough.

But Agents are not pipelines. They are loops. They try things, fail, retry, and change direction. To build this, you need a State Machine.

This post is for engineers ready to graduate from simple chains to LangGraph. We will explore how to explicitly define your agent's "brain" using Nodes, Edges, and Shared State.

The shift: From DAGs to cycles

Most data pipelines are DAGs (Directed Acyclic Graphs). Data flows in one direction, like water down a hill.

Agents require Cycles.

  • Refinement Loop: Draft -> Critique -> (Loop back to Draft) -> Finalize.
  • Tool Retry Loop: Call Tool -> Error -> (Loop back to fix arguments) -> Call Tool.

You cannot model a loop in a DAG. You need a graph that supports cycles.

graph TD
    subgraph STANDARD["Standard Chain (DAG)"]
        A1[Input] --> B1[LLM]
        B1 --> C1[Output]
    end
    
    subgraph AGENTIC["Agentic Graph (Cyclic)"]
        A2[Input] --> B2[Think]
        B2 --> C2[Act]
        C2 --> D2{Success?}
        D2 -- "No" --> B2
        D2 -- "Yes" --> E2[Output]
    end
    
    style B2 fill:#e3f2fd,stroke:#0d47a1
    style D2 fill:#fff9c4,stroke:#fbc02d

The core concept: "Shared State"

In a standard chain, data is passed like a hot potato from function to function. Function A outputs a string, Function B takes a string. It's messy.

In LangGraph, we define a Global State—a shared "clipboard" that every node can read from and write to.

Engineering Best Practice: Define your state strictly using TypedDict or Pydantic.

from typing import TypedDict, List, Optional
import operator
from langgraph.graph import StateGraph, END

class AgentState(TypedDict):
    # The user's initial request
    question: str
    # The comprehensive list of research found
    research_notes: List[str]
    # The current draft of the answer
    draft: str
    # A counter to prevent infinite loops
    revision_count: int
    # Error logs
    errors: Optional[str]

Every node in our graph (Researcher, Writer, Editor) will take this AgentState as input and return a dictionary of updates to merge back into the state.

The architecture: Nodes and edges

LangGraph is built on two primitives:

  1. Nodes: Python functions that do work (LLM calls, Tool calls).
  2. Edges: The control flow logic (If/Else/Loop).

Let's build a Self-Correcting Coding Assistant.

1. The nodes (The workers)

def coder_node(state: AgentState):
    """Generates code based on the question."""
    print("--- WRITING CODE ---")
    code = llm.invoke(f"Write python code for: {state['question']}")
    return {"draft": code, "revision_count": state["revision_count"] + 1}

def tester_node(state: AgentState):
    """Runs the code and checks for errors."""
    print("--- TESTING CODE ---")
    try:
        exec(state["draft"]) # Warning: Don't use exec in prod without sandboxing!
        return {"errors": None}
    except Exception as e:
        return {"errors": str(e)}

2. The conditional edge (The router)

Edges aren't just lines; they are logic. We use Conditional Edges to determine where to go next based on the State.

def router_logic(state: AgentState):
    # 1. Check if we hit the max retries (Safety Guardrail)
    if state["revision_count"] > 3:
        return "end"
    
    # 2. Check if the test passed
    if state["errors"] is None:
        return "end"
    
    # 3. If errors exist, loop back to coding
    return "retry"

3. Wiring the graph

Now we assemble the machine.

# Initialize
workflow = StateGraph(AgentState)

# Add Nodes
workflow.add_node("coder", coder_node)
workflow.add_node("tester", tester_node)

# Define Flow
workflow.set_entry_point("coder")
workflow.add_edge("coder", "tester")

# Define the Loop
workflow.add_conditional_edges(
    "tester",          # Start at the Tester
    router_logic,      # Check the logic
    {                  # Map logic outputs to nodes
        "retry": "coder",  # THE CYCLE: Go back to Coder
        "end": END         # Finish
    }
)

app = workflow.compile()
graph TD
    A[Start] --> B(Coder Node)
    B --> C(Tester Node)
    C --> D{Router Logic}
    
    D -- "Has Errors" --> B
    D -- "No Errors" --> E[End]
    D -- "Max Retries" --> E
    
    style D fill:#ffebee,stroke:#b71c1c

Observation: The Coder node is smarter the second time it runs. Why? Because the State now contains the errors from the Tester. The prompt can be updated to say: "Here is your previous code and the error it caused. Fix it."

Error handling and "Time Travel"

One of the biggest engineering benefits of explicit state machines is Time Travel (or Checkpointing).

Because the entire reality of the agent is stored in the AgentState, we can:

  1. Pause the graph after a failure.
  2. Inspect the history of the state at every step.
  3. Rewind to the step before the error, manually modify the state (e.g., fix a bad variable), and Resume execution.

This is impossible with standard "chains" of functions.

Summary: Why use graphs?

Feature Standard Chain (LangChain) State Machine (LangGraph)
Structure Linear (A -> B -> C) Cyclic (A -> B -> A -> C)
Memory Implicit (Message History) Explicit (Shared State Schema)
Control Hardcoded sequence Dynamic routing based on data
Resilience If step B fails, chain dies If step B fails, route to "Fallback"

Challenge for you

Scenario: You are building a "Newsletter Generator".

  • State: topic, search_results, draft, critique.
  • Nodes: Researcher, Writer, Editor.
  • The Logic:
    1. Researcher finds links.
    2. Writer creates a draft.
    3. Editor critiques the draft.
    4. Constraint: If the Editor says "Too short," go back to Writer. If the Editor says "Not enough facts," go back to Researcher.

Your Task:

  1. Define the TypedDict State.
  2. Draw the graph. Note that the Editor has two different backward arrows (cycles).
  3. How do you prevent the Editor from looping forever if the Writer refuses to change? (Hint: Add a variable to the State).

Key takeaways

  • State machines enable cycles: Unlike DAGs, graphs can loop back to previous nodes, enabling retry and refinement patterns
  • Shared state is the "clipboard": All nodes read from and write to a single state object, making data flow explicit and traceable
  • Conditional edges enable dynamic routing: The graph can branch and loop based on the current state, not just a fixed sequence
  • TypedDict enforces structure: Strict state schemas prevent bugs and make the system self-documenting
  • Checkpointing enables time travel: By persisting state, you can pause, inspect, and resume workflows at any point
  • Nodes are stateless workers: Each node is a pure function that takes state and returns updates, making them testable and composable
  • Safety guardrails prevent infinite loops: Counters and max retry limits protect against runaway execution

For more on building agentic systems, see our multi-agent coordination guide, our agent architecture guide, and our LangGraph agent brain guide.


For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.