What's the time commitment for this bootcamp?

The bootcamp requires 10 hours per week over 6 weeks. This includes live sessions, hands-on projects, and self-paced learning. Most students find this manageable alongside their full-time jobs.

Do I need prior AI experience to join?

No prior AI experience is required, but you should have few years of software development experience. The bootcamp is designed for software engineers who want to upskill in AI engineering.

What if I can't make a live session?

All live sessions are recorded and available for replay. We also offer multiple office hours throughout the week, so you can catch up on any missed content or get help with assignments.

How much should I budget for APIs and resources?

We estimate €10-50 for the entire bootcamp, covering API costs for OpenAI and other services. We'll show you how to optimize costs and use free tiers when possible.

What happens if I can't attend this cohort?

You can defer to the next cohort at no additional cost. We run cohorts every 2-3 months, so you won't have to wait long to join.

How long will I have access to materials after the bootcamp?

You'll have lifetime access to all course materials, recordings, and the private community. This includes future updates and new content we add to the bootcamp.

What's the refund policy?

Yes, you can get a 100% refund if you've progressed less than 10% of the bootcamp or it's within 7 days of your purchase. We're confident in our curriculum and instructor quality, which is why we offer this guarantee.

Do you offer team discounts?

Yes! We offer 20%+ discounts for teams of 3 or more. Contact us at param@learnwithparam.com for team pricing and bulk enrollment options.

Choosing the Right LLM for Each Task: From Nano to MoE

Welcome to the next post in our AI Engineering in Practice series!

In our last projects, we built a complete, streaming RAG agent. We focused on the plumbing—the API, the streaming, the self-correction loops. We just picked one LLM (like gpt-4o-mini) and used it for everything.

This is like building a high-performance race car... and then using its 800-horsepower engine to also power the windshield wipers and the radio. It's powerful, but it's an incredible waste of energy, money, and time.

To level up from an enthusiast to a professional AI engineer, you must master the most important decision: choosing the right "engine" (LLM) for the right job.

Today, we'll explore the spectrum of models, from tiny "Nano" LLMs to massive "MoE" models, and learn how to build smarter, faster, and cheaper products.

The problem: the one-size-fits-all fallacy

Let's look at the agent we built. It has at least three different "thinking" steps:

Routing: Deciding where to get information (Vector Store vs. Web Search).
Grading: Deciding if the retrieved documents are relevant ("yes" or "no").
Generating: Synthesizing the final, creative answer.

Using a single, powerful model for all three is a classic beginner's mistake.

It's Expensive: Why use a massive, $10/million-token model for a simple "yes/no" grading task?
It's Slow: Large models have higher latency. A simple routing decision that should take 100 milliseconds might take 2-3 seconds, adding painful delays before your app even starts working.

A senior engineer knows that Task 1 (Routing) and Task 3 (Generating) have completely different needs.

The spectrum of models: Nano, standard, and MoE

Not all LLMs are created equal. They exist on a spectrum of size, speed, cost, and "intelligence."

graph LR
    subgraph SPEED["Speed & Cost"]
        direction LR
        Nano["Nano LLMs <br/> (e.g., Phi-3 Mini, Gemma 2B) <br/> FAST & CHEAP"]
        Standard["Standard LLMs <br/> (e.g., Llama 3 8B, GPT-4o-mini) <br/> BALANCED"]
        MoE["Frontier / MoE LLMs <br/> (e.g., GPT-4o, Mixtral, Claude 3.5) <br/> POWERFUL & SLOW"]
    end
    
    subgraph IQ["Intelligence (IQ)"]
        direction LR
        Low[Low] --> Mid[Medium] --> High[High]
    end

    Nano --> Standard --> MoE
    Low --> Mid --> High
    
    style Nano fill:#e6ffed,stroke:#006d2c,stroke-width:2px
    style MoE fill:#eef,stroke:#303f9f,stroke-width:2px
    style Standard fill:#fff8e1,stroke:#f57f17,stroke-width:2px

1. Nano models (the specialists)

What they are: Tiny, fast models (often under 7 billion parameters) designed for specific, simple tasks.
Examples: Microsoft Phi-3 Mini, Google Gemma 2B.
Best for:
- Classification: Is this email "Spam" or "Not Spam"?
- Routing: Is this question "Internal" or "External"?
- Grading: Is this document "Relevant" or "Not Relevant"?
- Data Extraction: Pulling {"name": "...", "age": ...} from a block of text.
Strength: Extremely fast (low latency) and incredibly cheap (or free to run locally).
Weakness: Low "IQ." They are terrible at creative writing or complex, multi-step reasoning.

2. Standard models (the workhorses)

What they are: Mid-size models that offer the best balance of speed, cost, and intelligence.
Examples: Meta Llama 3 8B, gpt-4o-mini.
Best for:
- General-purpose chatbots.
- Summarizing medium-sized articles.
- Prototyping new features quickly.
Strength: The "good enough" default.
Weakness: Master of none. Not as smart as the big models, not as fast as the nano models.

3. Frontier & MoE models (the powerhouses)

What they are: The largest, most powerful models available.
Examples: OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, Google Gemini 1.5 Pro.
Key Concept: Mixture of Experts (MoE):
- You'll see "MoE" models like Mixtral and GPT-4o in this category.
- Analogy: Instead of one giant, 1-trillion parameter brain, an MoE model is like a team of 8 specialist brains (the "Experts").
- When you ask a question, a tiny, fast "router" inside the model instantly picks the best 2-3 experts to handle it.
- This makes MoE models dramatically faster and cheaper to run than a single giant model of the same size. It's the architecture that makes "Frontier" performance affordable.
Best for:
- Final Generation: Writing the beautiful, creative, nuanced bedtime story.
- Complex Reasoning: Answering a multi-part question that requires synthesizing information.
Strength: Highest "IQ" on the market.
Weakness: Slowest and most expensive per-token.

The how: building an asymmetric agent

This is the "level up" for an agent engineer.

A "junior" agent uses one LLM for all steps.

A "senior" agent builds an asymmetric system, using different LLMs for different steps.

Let's redesign the agent from our last post.

graph TD
    A[User Query] --> B["Step 1: Route Query <br/> [Nano LLM: Phi-3 Mini]"]
    B -- "Internal Question" --> C[Retrieve from Vector Store]
    B -- "External Question" --> D[Search the Web]
    C --> E["Step 2: Grade Docs <br/> [Nano LLM: Phi-3 Mini]"]
    E -- "Good Docs" --> F["Step 3: Generate Answer <br/> [MoE LLM: GPT-4o]"]
    E -- "Bad Docs" --> D
    D --> F
    F --> G[Final Answer]

    style B fill:#e6ffed,stroke:#006d2c,stroke-width:2px
    style E fill:#e6ffed,stroke:#006d2c,stroke-width:2px
    style F fill:#eef,stroke:#303f9f,stroke-width:2px

Our New, Smarter System:

Step 1 (Router): The user's query ("How does Model-V compare to Model-Z?") comes in. We send this to a Nano LLM (Phi-3 Mini). This is a simple classification task. The model's only job is to output the word "web_search". This is incredibly fast and cheap.
Step 2 (Grader): After retrieval, the documents are sent to another Nano LLM. Its only job is to output "yes" or "no". Again, fast and cheap.
Step 3 (Generate): The high-quality context is finally sent to a Powerhouse MoE LLM (GPT-4o). This model's job is to do what it does best: synthesize a complex, nuanced answer.

Observation: We've replaced two of our three "thinking" steps with tiny, specialized models, saving 80% of the "thinking cost" and drastically speeding up the app's responsiveness. We save the expensive, powerful model for the one step that actually needs it: the final answer.

A simple framework for choosing

So how do you choose? Here is a simple 2x2 matrix to guide your decision.

	Task is Simple / Structured (e.g., Classify, Extract JSON, Route)	Task is Complex / Creative (e.g., Write, Reason, Synthesize)
Speed/Cost is CRITICAL (e.g., Real-time agent steps)	Nano Models (Phi-3 Mini, Gemma 2B)	Standard Models (Llama 3 8B, GPT-4o-mini)
Quality is CRITICAL (e.g., Final answer to user)	Standard Models (Overkill, but reliable)	Frontier / MoE Models (GPT-4o, Claude 3.5)

Observation Question: Look at our Bedtime Story Generator.

The StoryRequest model had fields like character_name and story_theme. If a user just typed "Make a story about Leo the lion who learns to be brave", what kind of model would be best for extracting the JSON {"character_name": "Leo", "story_theme": "bravery"}?
What kind of model would be best for writing the final story?

Key takeaways

Stop the "one-size-fits-all" approach: Using one giant LLM for every step is slow, expensive, and inefficient
Use nano models for agentic plumbing: Nano LLMs (like Phi-3 Mini) are perfect for the internal steps of an agent: routing, grading, classifying, and extracting structured data. They are fast, cheap, and reliable for simple tasks
Use powerhouse models for the performance: Use your expensive, powerful MoE models (like GPT-4o or Claude 3.5) for the one step the user actually sees: the final, complex, creative answer
MoE = power + efficiency: "Mixture of Experts" is an architecture that makes massive models affordable and fast by using small, specialized "experts" internally
Leveling up as an agent engineer: Means building asymmetric systems: using the smallest, fastest, cheapest tool possible for each step in your agent's logical chain

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Choosing the Right LLM for Each Task: From Nano to MoE

Share this post

The problem: the one-size-fits-all fallacy

The spectrum of models: Nano, standard, and MoE

1. Nano models (the specialists)

2. Standard models (the workhorses)

3. Frontier & MoE models (the powerhouses)

The how: building an asymmetric agent

A simple framework for choosing

Key takeaways

Share this post

Continue Reading

AI Engineering in Practice: Building an AI Bedtime Story Generator

Structured Output: Making LLMs Application-Ready

Agentic RAG: Building a System That Thinks, Acts, and Corrects

Advanced RAG: Building Self-Correcting Systems

Prompt Engineering: How to Talk to an LLM

Choosing the Right LLM for Each Task: From Nano to MoE

Share this post

Share this post

Continue Reading

AI Engineering in Practice: Building an AI Bedtime Story Generator

Structured Output: Making LLMs Application-Ready

Agentic RAG: Building a System That Thinks, Acts, and Corrects

Advanced RAG: Building Self-Correcting Systems

Prompt Engineering: How to Talk to an LLM

Weekly Bytes of AI — Newsletter by Param