Prompt Engineering: The Refinement Loop (How to Make an LLM Critique Itself)
We've built bots that are specific (explicit instructions), formatted (structured JSON), and logical (chain-of-thought). But what about quality?
This post is for you if you've ever asked an LLM for a creative idea (a slogan, an email, a blog title) and gotten a "first-draft" answer that's bland, uninspired, or just okay.
LLMs are like an eager intern: their first idea is rarely their best one. Today, we'll build a Creative Assistant and use the Self-Critique & Refinement technique to force the LLM to review and improve its own work.
The problem: the First Draft Bot
Let's ask our bot for a creative task: writing a slogan.
Use Case: "Write a slogan for our new, eco-friendly water bottle."
graph TD
A["User: 'Write a slogan for my new water bottle.'"] --> B(LLM)
B --> C["Bot: 'Here's a slogan: 'Drink Water, Save the Planet.''"]
style C fill:#fff8e1,stroke:#f57f17,color:#212121
Why this is bad:
- It's bland.
- It's generic.
- It's not catchy.
The bot did the task, but the quality is low. It gave us its first, uninspired thought.
The solution: build a refinement loop (Self-Critique)
We will force the LLM to have a "second thought". Instead of asking for just the slogan, we will ask for a process:
- Draft: Generate the first idea.
- Critique: Analyze that idea against our criteria.
- Finalize: Create a new idea based on the critique.
This Self-Critique loop forces the LLM to be more deliberative.
The "How": We'll use Goal Decomposition (from our last post) to define this new, multi-step process.
prompt = """
I need a slogan for a new eco-friendly water bottle.
Please follow this exact 3-step process:
1. **DRAFT:** First, generate one initial slogan.
2. **CRITIQUE:** Second, critically evaluate your own DRAFT.
- Is it short and catchy?
- Does it clearly communicate "eco-friendly"?
- What is its single biggest weakness?
3. **FINALIZE:** Third, based on your critique, write a
final, improved slogan.
"""
Now let's watch the agent "think."
graph TD
A["User: Critique-Loop Slogan Prompt"] --> B(LLM)
subgraph PROCESS["LLM's Internal Process"]
direction TB
B1["1. DRAFT: 'Drink Water, Save the Planet.'"]
B1 --> B2["2. CRITIQUE: 'The draft is okay, but not catchy. It's too generic. Its weakness is that it's not memorable.'"]
B2 --> B3["3. FINALIZE: 'Sip Sustainably. Drink Forever.'"]
end
B --> C["Bot: Provides DRAFT, CRITIQUE identifying weaknesses, and improved FINALIZE slogan"]
style C fill:#e8f5e9,stroke:#388e3c,color:#212121
Observation: This is a night-and-day difference. The Final slogan is dramatically better than the Draft because we forced the LLM to find the flaws in its own first idea.
We've created a system that iterates on its own work.
Think About It: This technique is fantastic for creative tasks. What other "Critique" criteria could you add to the prompt to get even better slogans? (Hint: Think about "Target Audience", "Brand Voice", etc.)
A simpler hybrid: Rephrase and Respond (RaR)
A full "Draft-Critique-Finalize" loop is powerful, but also very long (and thus, more expensive). A simpler, faster version is the Rephrase and Respond (RaR) technique.
You just ask the LLM to rephrase its understanding of the task first. This forces it to "commit" to a plan, which often improves the result.
The "How":
prompt = """
I need a short, exciting slogan for a new video game
about space pirates.
First, rephrase your understanding of this task.
Then, generate 3 slogans.
"""
graph TD
A["User: RaR Slogan Prompt"] --> B(LLM)
B --> C["Bot: Rephrases task understanding, then generates 3 high-energy space pirate slogans"]
style C fill:#e8f5e9,stroke:#388e3c,color:#212121
Observation: The "Rephrasing" step forces the LLM to activate the correct concepts ("high-energy", "science fiction", "pirate theme") before it starts writing. This simple "warm-up" leads to much more targeted and creative results.
Challenge for you
-
Use Case: You are building an "Email Writer" bot.
-
The Problem: You ask,
Write an email to my boss asking for Friday off.The bot writes a very basic, bland email. -
Your Task: Design a Self-Critique prompt to fix this. What "Critique" criteria would you add to make the email more professional, persuasive, and polite?
Key takeaways
- First drafts are rarely best: LLMs benefit from iterative refinement, just like human writers
- Self-critique improves quality: Forcing the LLM to evaluate its own work leads to dramatically better outputs
- Refinement loops create iteration: Breaking tasks into Draft-Critique-Finalize steps mimics professional creative processes
- RaR is a lightweight alternative: Rephrase and Respond provides quick quality improvements without full critique loops
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.