Self-critique and refinement prompting
We've built bots that are specific (explicit instructions), formatted (structured JSON), and logical (chain-of-thought). But what about quality?
This post is for you if you've ever asked an LLM for a creative idea (a slogan, an email, a blog title) and gotten a "first-draft" answer that's bland, uninspired, or just okay.
LLMs are like an eager intern: their first idea is rarely their best one. Today, we'll build a Creative Assistant and use the Self-Critique & Refinement technique to force the LLM to review and improve its own work.
What's the problem with the first draft bot?
Let's ask our bot for a creative task: writing a slogan.
Use Case: "Write a slogan for our new, eco-friendly water bottle."
graph TD
A["User: 'Write a slogan for my new water bottle.'"] --> B(LLM)
B --> C["Bot: 'Here's a slogan: 'Drink Water, Save the Planet.''"]
style C fill:#fff8e1,stroke:#f57f17,color:#212121
Why this is bad:
- It's bland.
- It's generic.
- It's not catchy.
The bot did the task, but the quality is low. It gave us its first, uninspired thought.
How do you build a refinement loop for self-critique?
We will force the LLM to have a "second thought". Instead of asking for just the slogan, we will ask for a process:
- Draft: Generate the first idea.
- Critique: Analyze that idea against our criteria.
- Finalize: Create a new idea based on the critique.
This Self-Critique loop forces the LLM to be more deliberative.
The "How": We'll use Goal Decomposition (from our last post) to define this new, multi-step process.
# filename: example.py
# description: Code example from the post.
prompt = """
I need a slogan for a new eco-friendly water bottle.
Please follow this exact 3-step process:
1. DRAFT: First, generate one initial slogan.
2. CRITIQUE: Second, critically evaluate your own DRAFT.
- Is it short and catchy?
- Does it clearly communicate "eco-friendly"?
- What is its single biggest weakness?
3. FINALIZE: Third, based on your critique, write a
final, improved slogan.
"""
Now let's watch the agent "think."
graph TD
A["User: Critique-Loop Slogan Prompt"] --> B(LLM)
subgraph PROCESS["LLM's Internal Process"]
direction TB
B1["1. DRAFT: 'Drink Water, Save the Planet.'"]
B1 --> B2["2. CRITIQUE: 'The draft is okay, but not catchy. It's too generic. Its weakness is that it's not memorable.'"]
B2 --> B3["3. FINALIZE: 'Sip Sustainably. Drink Forever.'"]
end
B --> C["Bot: Provides DRAFT, CRITIQUE identifying weaknesses, and improved FINALIZE slogan"]
style C fill:#e8f5e9,stroke:#388e3c,color:#212121
Observation: This is a night-and-day difference. The Final slogan is dramatically better than the Draft because we forced the LLM to find the flaws in its own first idea.
We've created a system that iterates on its own work.
Think About It: This technique is fantastic for creative tasks. What other "Critique" criteria could you add to the prompt to get even better slogans? (Hint: Think about "Target Audience", "Brand Voice", etc.)
What's a simpler hybrid: rephrase and respond (rar)?
A full "Draft-Critique-Finalize" loop is powerful, but also very long (and thus, more expensive). A simpler, faster version is the Rephrase and Respond (RaR) technique.
You just ask the LLM to rephrase its understanding of the task first. This forces it to "commit" to a plan, which often improves the result.
The "How":
prompt = """
I need a short, exciting slogan for a new video game
about space pirates.
First, rephrase your understanding of this task.
Then, generate 3 slogans.
"""
graph TD
A["User: RaR Slogan Prompt"] --> B(LLM)
B --> C["Bot: Rephrases task understanding, then generates 3 high-energy space pirate slogans"]
style C fill:#e8f5e9,stroke:#388e3c,color:#212121
Observation: The "Rephrasing" step forces the LLM to activate the correct concepts ("high-energy", "science fiction", "pirate theme") before it starts writing. This simple "warm-up" leads to much more targeted and creative results.
Challenge for you
-
Use Case: You are building an "Email Writer" bot.
-
The Problem: You ask,
Write an email to my boss asking for Friday off.The bot writes a very basic, bland email. -
Your Task: Design a Self-Critique prompt to fix this. What "Critique" criteria would you add to make the email more professional, persuasive, and polite?
Frequently asked questions
How do I make an LLM critique its own work?
Use a three-step Self-Critique loop: Draft (generate first idea), Critique (analyze against criteria), Finalize (generate improved version based on critique). This forces the LLM to find flaws in its thinking before committing to an answer. The key is making critique criteria explicit, target audience, brand voice, tone. It costs more than a single pass but produces dramatically better creative output. For cheaper alternatives, try Rephrase-and-Respond instead.
When should I use rephrase-and-respond instead of a full refinement loop?
Use RaR when cost or latency matters more than quality. It's 30-50% cheaper than Draft-Critique-Finalize because it requires only two passes instead of three. RaR works by having the LLM rephrase its understanding of the task first, which activates the right concepts before answering. It's better for straightforward tasks; use the full loop for creative or high-stakes outputs where quality justifies the extra cost.
Why are LLM first drafts always bland and uninspired?
LLMs generate responses in a single forward pass without reflection, outputting their first coherent idea instead of reconsidering it. This works fine for factual questions but fails for creative work where iteration matters. Adding a critique step forces the model to evaluate its own output and generate alternatives. The second-pass answer is almost always better because the LLM now understands what makes a good solution.
For the full reference, see the Anthropic prompt engineering guide.
Key takeaways
- First drafts are rarely best: LLMs benefit from iterative refinement, just like human writers
- Self-critique improves quality: Forcing the LLM to evaluate its own work leads to dramatically better outputs
- Refinement loops create iteration: Breaking tasks into Draft-Critique-Finalize steps mimics professional creative processes
- RaR is a lightweight alternative: Rephrase and Respond provides quick quality improvements without full critique loops
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.
What does take the next step look like?
- Prompt Engineering Crash Course, Master self-critique, refinement, and advanced prompting techniques
Continue Reading
Ready to go deeper?
Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.