Engineer the behavior of your LLMs in production

Param Harrison
3 min read

Share this post

LLMs aren’t just “APIs you hit”, they’re probabilistic interfaces you design.

This guide shows how to engineer model behavior reliably using:

  • Prompt contracts (not wishes)
  • Sampling controls (temperature, top_p)
  • Hallucination mitigation (RAG, verification, strict schemas)
  • Function calling (tools as the backbone of agents)

If you build production AI, treat the LLM like a probabilistic interface, so you define the interface, parameters, and contracts and then test them.

1. Prompting: Interface Design

Prompts are contracts, not vibes. So specify the role, constraints, examples, and output format to control the behavior of the model.

Bad:

Write a function that does stuff.

Better (contract):

You are a Python expert.
Write a typed function merge_sorted_lists(a, b) that merges two sorted lists in O(n+m) time.
Constraints:
- Return a new sorted list
- Do not mutate inputs
- Include a docstring and a minimal unit test using pytest
Output: Only Python code, no prose

Treat every prompt like a function spec:

  • Role: What persona and domain context does the model adopt?
  • Constraints: Time/space complexity, guardrails, style rules
  • Examples: Positive and negative examples reduce ambiguity
  • Fenced outputs: Ask for JSON or code-only output to simplify parsing

Example with fenced JSON to ensure strict structure:

You are a senior QA engineer. Validate the response for factual accuracy.
Return ONLY valid JSON:
{
  "isAccurate": true,
  "issues": ["..."],
  "confidence": 0.82
}

When you control the interface, you control the behavior.

2. Sampling: Control Randomness

Two core knobs influence variability and creativity:

Parameter Effect Typical Use
temperature Randomness 0 → deterministic, 1 → creative
top_p Nucleus sampling Keeps top-probability tokens (cumulative)

Recommended starting points:

  • Code generation: temperature=0.2, top_p=0.95
  • Safety/QA checks: temperature=0.0–0.2
  • Ideation/brainstorming: temperature=0.7–0.9

Quick heuristics:

  • High temperature = more creative, less stable
  • Low temperature = predictable, may get repetitive

In production, keep sampling settings explicit per task and test them like any other config.

3. Hallucinations: Why they happen (and how to reduce them)

LLMs don’t “know” facts, they predict what text usually follows.

User: Who won the 2026 World Cup?

Model: Brazil defeated France 3–2.

It’s not lying, it’s pattern completion. This is why hallucinations are expected.

Mitigation patterns:

  • Retrieval (RAG) for facts: ground responses in authoritative sources
  • Verification loops: have the model check or re-derive answers
  • Strict output formats: force structured answers and validate them

Bonus: maintain an allowlist of domains and systematically reject/flag unsupported sources.

4. Function Calling: The modern, reliable pattern

Instead of hallucinating, the model can call your tools to get the information it needs.

{
  "name": "get_weather",
  "arguments": {"city": "Tokyo"}
}

Execution flow:

  1. You define a tool contract (name, arguments, schema)
  2. The model selects the tool call and arguments
  3. Your code executes the tool
  4. Results are returned to the model to produce the final answer

This is the backbone of agent architectures—models orchestrate tools instead of fabricating answers.

Design tips:

  • Keep tools small and composable; prefer clear, typed schemas
  • Validate arguments and handle timeouts/retries
  • Log tool I/O for observability and debugging

Key learnings

  • Prompting is programming: write specs, not wishes
  • Sampling is configuration: tune for task stability vs. creativity
  • Hallucinations are expected: reduce with RAG, verification, and schemas
  • Function calling makes agents reliable: tools > guesses

Ship with explicit contracts and measurable behavior. That’s how you engineer LLMs for real users.

Share this post

Continue Reading

Weekly Bytes of AI — Newsletter by Param

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.