Prompt Engineering: The Refinement Loop (How to Make an LLM Critique Itself)

Param Harrison
4 min read

Share this post

We've built bots that are specific (explicit instructions), formatted (structured JSON), and logical (chain-of-thought). But what about quality?

This post is for you if you've ever asked an LLM for a creative idea (a slogan, an email, a blog title) and gotten a "first-draft" answer that's bland, uninspired, or just okay.

LLMs are like an eager intern: their first idea is rarely their best one. Today, we'll build a Creative Assistant and use the Self-Critique & Refinement technique to force the LLM to review and improve its own work.

The problem: the First Draft Bot

Let's ask our bot for a creative task: writing a slogan.

Use Case: "Write a slogan for our new, eco-friendly water bottle."

graph TD
    A["User: 'Write a slogan for my new water bottle.'"] --> B(LLM)
    B --> C["Bot: 'Here's a slogan: 'Drink Water, Save the Planet.''"]
    
    style C fill:#fff8e1,stroke:#f57f17,color:#212121

Why this is bad:

  • It's bland.
  • It's generic.
  • It's not catchy.

The bot did the task, but the quality is low. It gave us its first, uninspired thought.

The solution: build a refinement loop (Self-Critique)

We will force the LLM to have a "second thought". Instead of asking for just the slogan, we will ask for a process:

  1. Draft: Generate the first idea.
  2. Critique: Analyze that idea against our criteria.
  3. Finalize: Create a new idea based on the critique.

This Self-Critique loop forces the LLM to be more deliberative.

The "How": We'll use Goal Decomposition (from our last post) to define this new, multi-step process.

prompt = """
I need a slogan for a new eco-friendly water bottle.

Please follow this exact 3-step process:

1.  **DRAFT:** First, generate one initial slogan.

2.  **CRITIQUE:** Second, critically evaluate your own DRAFT.
    - Is it short and catchy?
    - Does it clearly communicate "eco-friendly"?
    - What is its single biggest weakness?

3.  **FINALIZE:** Third, based on your critique, write a
    final, improved slogan.
"""

Now let's watch the agent "think."

graph TD
    A["User: Critique-Loop Slogan Prompt"] --> B(LLM)
    
    subgraph PROCESS["LLM's Internal Process"]
        direction TB
        B1["1. DRAFT: 'Drink Water, Save the Planet.'"]
        B1 --> B2["2. CRITIQUE: 'The draft is okay, but not catchy. It's too generic. Its weakness is that it's not memorable.'"]
        B2 --> B3["3. FINALIZE: 'Sip Sustainably. Drink Forever.'"]
    end
    
    B --> C["Bot: Provides DRAFT, CRITIQUE identifying weaknesses, and improved FINALIZE slogan"]
    
    style C fill:#e8f5e9,stroke:#388e3c,color:#212121

Observation: This is a night-and-day difference. The Final slogan is dramatically better than the Draft because we forced the LLM to find the flaws in its own first idea.

We've created a system that iterates on its own work.

Think About It: This technique is fantastic for creative tasks. What other "Critique" criteria could you add to the prompt to get even better slogans? (Hint: Think about "Target Audience", "Brand Voice", etc.)

A simpler hybrid: Rephrase and Respond (RaR)

A full "Draft-Critique-Finalize" loop is powerful, but also very long (and thus, more expensive). A simpler, faster version is the Rephrase and Respond (RaR) technique.

You just ask the LLM to rephrase its understanding of the task first. This forces it to "commit" to a plan, which often improves the result.

The "How":

prompt = """
I need a short, exciting slogan for a new video game
about space pirates.

First, rephrase your understanding of this task.

Then, generate 3 slogans.
"""
graph TD
    A["User: RaR Slogan Prompt"] --> B(LLM)
    B --> C["Bot: Rephrases task understanding, then generates 3 high-energy space pirate slogans"]
    
    style C fill:#e8f5e9,stroke:#388e3c,color:#212121

Observation: The "Rephrasing" step forces the LLM to activate the correct concepts ("high-energy", "science fiction", "pirate theme") before it starts writing. This simple "warm-up" leads to much more targeted and creative results.

Challenge for you

  1. Use Case: You are building an "Email Writer" bot.

  2. The Problem: You ask, Write an email to my boss asking for Friday off. The bot writes a very basic, bland email.

  3. Your Task: Design a Self-Critique prompt to fix this. What "Critique" criteria would you add to make the email more professional, persuasive, and polite?

Key takeaways

  • First drafts are rarely best: LLMs benefit from iterative refinement, just like human writers
  • Self-critique improves quality: Forcing the LLM to evaluate its own work leads to dramatically better outputs
  • Refinement loops create iteration: Breaking tasks into Draft-Critique-Finalize steps mimics professional creative processes
  • RaR is a lightweight alternative: Rephrase and Respond provides quick quality improvements without full critique loops

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI — Newsletter by Param

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.