Architecting AI Agents: Multi-Step Reasoning and Execution Loops

Param Harrison
6 min read

Share this post

The Challenge

You built a chatbot that can call tools. Now product asks it to autonomously plan multi-step tasks (book travel, update tickets, triage incident).

Suddenly:

  • The agent loops forever on ambiguous goals
  • External APIs get spammed by repeated calls
  • One bad plan causes high cost and data leaks

Discussion: How do you make an autonomous agent that is useful, safe, and predictable under load?

1. Agent types and when to use them

Agent Type Behavior Use Case
Reactive Single step decisions, no planning Chat replies, simple tool calls
Deliberative Plan then execute multiple steps Complex workflows, multi-call orchestration
Hybrid Mix planning and reactive fallback Most production agents

Rule: Start with reactive or guided agents. Add autonomy only when you can observe and control every action.

2. The agent's execution loop (Core pattern)

An agent is an execution loop: observe → plan → act → observe.

sequenceDiagram
    participant User
    participant Agent
    participant Planner
    participant Executor
    participant Tool
    
    User->>Agent: Goal
    Agent->>Planner: Create plan
    Planner-->>Agent: Plan steps
    Agent->>Executor: Execute step 1
    Executor->>Tool: Call external api
    Tool-->>Executor: Result
    Executor-->>Agent: Step result
    Agent->>Planner: Feedback for replanning
    Agent-->>User: Final result

Key decisions: step granularity, synchronous vs asynchronous execution, and whether to checkpoint state after each step.

3. Planner vs Executor separation

Keep planning and execution separate.

flowchart TD
    A[Input Goal] --> B[Planner]
    B --> C[Plan Store]
    C --> D[Executor]
    D --> E[Tool Calls]
    E --> F[Results Store]
    F --> G[Replanner]

Why:

  • Observability: trace planning decisions separately
  • Safety: validate plans before execution
  • Retry and resume: checkpoint plans so restarts are bounded

4. Tool interface design (Define contracts)

Treat each external tool like a strict API you control.

Tool contract fields:

  • name
  • arguments schema
  • idempotency key requirement
  • required auth scope
  • cost estimate per call

Example tool spec (conceptual):

tool sendEmail
args { to string, subject string, body string }
idempotency required true
auth scope email_send
estimated cost small

Enforce contracts with runtime validation and dry-run mode from the planner.

5. Safety patterns for agents

Dry run / plan approval

Always present plan summary for human approval for high risk tasks.

Idempotency keys

Require idempotency for any effectful tool call.

Rate limits and quotas

Enforce per-agent and per-user quotas to stop runaway costs.

Policy engine

Pre-validate plans against guardrails (data exfiltration, PII, forbidden actions).

Sandboxing tools

Run dangerous tools in limited environments and require extra verification.

6. State, Checkpointing, and Resume

Design state so agents can resume after failure without redoing external effects.

Patterns:

  • Checkpoint after each step: persist step index, intermediate artifacts, and partial outputs
  • Two-phase commit for multi-step transactions: prepare → commit (use sparingly and only when all tools support rollback)
  • Compensating actions: for non-rollbackable steps, register compensator tasks
flowchart TD
    A[Plan] --> B[Step 1]
    B --> C[Checkpoint 1]
    C --> D[Step 2]
    D --> E[Checkpoint 2]
    E --> F[Complete]

7. Observability and explainability for agents

Trace and expose:

  • Plan version and rationale for each step
  • Tool call arguments and responses (masked for PII)
  • Decision provenance (which prompt and context produced each plan)
  • Resource usage per step (tokens, time, cost)
flowchart LR
    Agent --> Trace
    Trace --> Dashboard
    Trace --> Alerting

Explainability reduces debugging time and supports audit requirements.

8. Testing agents: Unit, Integration, Chaos

Test tiers:

  • Unit: planner heuristics and prompt outputs using deterministic model settings
  • Integration: execution with stubbed tools
  • Contract tests: tool schemas and idempotency keys
  • Chaos tests: simulate tool failures, network partitions, and partial responses
  • Cost tests: estimate tokens and calls for representative workloads

Tip: Use deterministic model settings for repeatable tests (temperature 0 and fixed seeds).

9. Cost and throughput controls

Agents can blow budgets quickly. Control knobs:

  • Max steps per plan (hard cap)
  • Cost budget per request (reject or degrade when exceeded)
  • Dynamic model selection per step (large model only for planning; smaller for templated text)
  • Caching of tool results and intermediate artifacts
flowchart LR
    A[Request] --> B[Budget Check]
    B -->|ok| C[Planner]
    B -->|exceed| D[Reject]

10. Example: Task automation agent (Booking workflow)

Scenario: Agent must book travel: check flights, reserve, charge card, send confirmation.

flowchart TD
    UserGoal[User Goal] --> Planner
    Planner --> PlanStore
    PlanStore --> Executor
    Executor --> CheckFlights
    Executor --> ReserveSeat
    Executor --> ChargeCard
    Executor --> SendConfirmation

Safety enforcement:

  • Require plan approval before ChargeCard
  • Use idempotency for ReserveSeat and ChargeCard
  • Log decisions and attach trace id to payment call

11. Dealing with ambiguity and infinite loops

Agents can loop when goals are underspecified.

Defenses:

  • Clarification step: require confirmation when plan confidence is below threshold
  • Step budget: cap maximum steps and abort with explainable reason
  • Divergence detection: monitor for repeated states and abort when seen X times

12. Metrics that matter for agents

  • Plan success rate
  • Average steps per task
  • Tool failure rate per tool
  • Cost per successful task
  • Human approval rate (if used)
  • Time to completion and first token latency

Capture these per agent version to measure regressions.

Discussion prompts for engineers

  • Where do you draw the line between autonomous action and human approval?
  • How do you design idempotency for tools that cannot be rolled back?
  • What's your preferred checkpoint frequency balancing cost and resume granularity?
  • How would you simulate a malicious planner that tries to exfiltrate data?

TL;DR — Agent Design Cheat Sheet

  • Separate planner from executor
  • Treat tools as strict contracted services with idempotency
  • Checkpoint after steps and make resume explicit
  • Enforce safety via dry-run, policy checks, and human approval for risky actions
  • Instrument every plan and step for observability and cost tracing
  • Test with deterministic settings, integrate chaos testing, and cap budgets

Takeaway

  • Autonomy is powerful — and dangerous — when unchecked
  • Production agents are safe only when they are transparent, auditable, and bounded
  • Build slow, observe fast, and never deploy a fully autonomous agent without cost controls, human-in-the-loop options, and a thorough testing and monitoring pipeline

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI — Newsletter by Param

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.