What's the time commitment for this bootcamp?

The bootcamp requires 10 hours per week over 6 weeks. This includes live sessions, hands-on projects, and self-paced learning. Most students find this manageable alongside their full-time jobs.

Do I need prior AI experience to join?

No prior AI experience is required, but you should have few years of software development experience. The bootcamp is designed for software engineers who want to upskill in AI engineering.

What if I can't make a live session?

All live sessions are recorded and available for replay. We also offer multiple office hours throughout the week, so you can catch up on any missed content or get help with assignments.

How much should I budget for APIs and resources?

We estimate €10-50 for the entire bootcamp, covering API costs for OpenAI and other services. We'll show you how to optimize costs and use free tiers when possible.

What happens if I can't attend this cohort?

You can defer to the next cohort at no additional cost. We run cohorts every 2-3 months, so you won't have to wait long to join.

How long will I have access to materials after the bootcamp?

You'll have lifetime access to all course materials, recordings, and the private community. This includes future updates and new content we add to the bootcamp.

What's the refund policy?

Yes, you can get a 100% refund if you've progressed less than 10% of the bootcamp or it's within 7 days of your purchase. We're confident in our curriculum and instructor quality, which is why we offer this guarantee.

Do you offer team discounts?

Yes! We offer 20%+ discounts for teams of 3 or more. Contact us at param@learnwithparam.com for team pricing and bulk enrollment options.

LLM Basics: How Machines Think (and Don't)

The big idea: it's all about prediction

If you saw the sentence: The dog chased the ___ — what word comes next?

You probably thought ball, cat, or squirrel. You didn't know the answer for sure; you predicted it based on common patterns you've seen in language before.

That's exactly what an LLM does. It's a prediction machine. It predicts the most likely next word (or "token") based on all the text it has learned from.

But if it's just guessing, how does it seem so smart? It's about how it guesses and the settings we can control.

How we talk to an LLM

In the coding world, you don't just "talk" to an LLM. You send it a structured request, often called an API call. Think of it as filling out a form for the LLM.

# 1. Import the necessary library
from openai import OpenAI

# 2. Initialize the connection client
#    (In a real app, the API key is loaded securely)
client = OpenAI(api_key="...") 

# 3. Create the chat completion request
response = client.chat.completions.create(
    model="gpt-4o-mini",  # Specify the model
    messages=[            # Define the message history
        {"role": "user", "content": "Explain what an LLM is in one sentence."}
    ],
    max_tokens=150        # Set a maximum token limit for the reply
)

# 4. Extract and print the text content of the reply
answer = response.choices[0].message.content
print(answer)

The magic isn't just in the prompt. It's in the settings you can add to that request. The most important one is "temperature."

Temperature: the creativity dial

"Temperature" is a setting that controls how creative or random the LLM's predictions are.

graph TD
    A[Temperature Dial] --> B{LLM's Behavior}
    B --> C[0.0: Predictable, Factual]
    B --> D[0.7: Balanced, General Chat]
    B --> E[1.5+: Creative, Unpredictable]

Low Temperature (e.g., 0.0): The LLM will always pick the most obvious, safest next word. It's boring, predictable, and great for facts or code.
High Temperature (e.g., 1.5): The LLM takes more risks, picking less common words. This makes it highly creative and imaginative, but also more likely to go off-topic.

Here's how we'd add that setting to our code:

# This request asks for a creative slogan
# by turning the temperature up.
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Write a catchy slogan for a coffee shop."}
    ],
    temperature=1.2,  # <-- Set the "creativity dial" high
    max_tokens=20
)

If you ran this, you'd get a different slogan almost every time. If you set temperature=0.0, you'd probably get the same slogan every single time.

Tokens: the LLM's word pieces

LLMs don't actually see "words." They see tokens.

Think of tokens as "word pieces." For English, 1 token is about 0.75 words.

graph TD
    A["Human Language: 'Hello world!'"] --> B[Tokenization]
    B --> C["'Hello' (1 token)"]
    B --> D["' world' (1 token)"]
    B --> E["'!' (1 token)"]

This matters for two big reasons: cost and limits.

Cost: You pay for every token. Both the tokens you send in (your prompt) and the tokens you get back (the answer).
Limits: Every LLM has a "context window," or a maximum number of tokens it can remember at one time.

Notice how different things "cost" different token amounts. Long words or other languages are often "more expensive" in tokens. A library like tiktoken is used to count them.

import tiktoken

# Get the encoder for a specific model
encoding = tiktoken.encoding_for_model("gpt-4o")

# 'encoding.encode' turns text into a list of token numbers
tokens_hello = encoding.encode("Hello")
tokens_long_word = encoding.encode("Antidisestablishmentarianism")
tokens_chinese = encoding.encode("人工智能") # "Artificial Intelligence"

print(f"'Hello': {len(tokens_hello)} tokens")
print(f"'Antidisestablishmentarianism': {len(tokens_long_word)} tokens")
print(f"'人工智能': {len(tokens_chinese)} tokens")

# Output:
# 'Hello': 1 tokens
# 'Antidisestablishmentarianism': 5 tokens
# '人工智能': 6 tokens

This is why an LLM might feel "smarter" or "cheaper" in English—it was trained on more English tokens, so it's more efficient at processing them.

Context windows: the LLM's short-term memory

The context window is the LLM's entire short-term memory. It's the maximum number of tokens (your prompt + its answer) it can handle at once.

graph TD
    A["Your Input (e.g., 2,000 Tokens)"] --- B["LLM's Brain"]
    C["LLM's Reply (e.g., 1,000 Tokens)"] --- B
    
    subgraph TOTAL["Total Memory Used: 3,000 Tokens"]
        A
        C
    end

    D{"Context Window Limit (e.g., 4,000 Tokens)"}
    TOTAL -- "Must Be Less Than" --> D

If your conversation (all your prompts and all its answers) gets longer than this limit, the LLM starts to "forget" the beginning of the conversation.

This is the single biggest challenge in using LLMs. You can't just ask it to "summarize this 500-page book" by pasting the whole book, because the book is probably 200,000 tokens, but the LLM's memory (context window) might only be 8,000 or 128,000 tokens.

The cost equation

Using LLMs isn't free. The cost is calculated very simply:

Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

A key insight: Output tokens (the LLM's answer) are almost always more expensive than input tokens (your prompt). It "costs" more for the LLM to think than to listen.

Here's the logic for a cost-calculating function:

# A simple function to estimate the cost of one LLM call.
def estimate_cost(input_text, output_text):
    # 1. Define example prices (per 1 MILLION tokens)
    INPUT_PRICE_PER_1M_TOKENS = 0.15  # $0.15
    OUTPUT_PRICE_PER_1M_TOKENS = 0.60 # $0.60 (4x more expensive!)
    
    # 2. Count the tokens (using a hypothetical count_tokens function)
    input_tokens = count_tokens(input_text)
    output_tokens = count_tokens(output_text)
    
    # 3. Calculate the cost for each part
    input_cost = (input_tokens / 1_000_000) * INPUT_PRICE_PER_1M_TOKENS
    output_cost = (output_tokens / 1_000_000) * OUTPUT_PRICE_PER_1M_TOKENS
    
    # 4. Add them up
    total_cost = input_cost + output_cost
    return total_cost

When LLMs fail (common glitches)

LLMs are amazing, but they are not perfect. They have very predictable failure modes.

1. Hallucinations (making stuff up)

An LLM's job is to predict the next word. It does not know what is true or false. A hallucination is when the LLM confidently generates a plausible-sounding but completely false statement.

If you ask it:

"Tell me about the 2023 Nobel Prize winner in Astrobotany."

It won't say, "That's not a real prize." It will invent a person, a university, and their "groundbreaking research" because those words statistically follow the pattern of your question.

2. Bad at math

LLMs are text-prediction machines, not calculators. They can recognize simple math (like 2 + 2 = 4) because they've seen that text in their training data. But they can't do math.

If you ask it:

"What is 234 * 567?"

It is very likely to give you the wrong answer. It's just predicting what a number looks like in that position, not actually performing the calculation.

Key takeaways

LLMs are predictors, not thinkers. They just guess the next most likely word.
Temperature is your "creativity dial." Low for facts, high for fiction.
Tokens are the "word pieces" you pay for. Everything has a cost.
Context Windows are the LLM's "short-term memory." This is their biggest limitation.
LLMs Hallucinate and are bad at math. Never trust them with facts or numbers without checking.

For more on building production AI systems, check out our AI Engineering Bootcamp.

LLM Basics: How Machines Think (and Don't)

Share this post

The big idea: it's all about prediction

How we talk to an LLM

Temperature: the creativity dial

Tokens: the LLM's word pieces

Context windows: the LLM's short-term memory

The cost equation

When LLMs fail (common glitches)

1. Hallucinations (making stuff up)

2. Bad at math

Key takeaways

Share this post

Continue Reading

Domain-Specific Voice Flows: Building the Guardrails

Multi-Agent Voice Systems: The Warm Transfer

Voice Conversation Memory: Why Your Bot Forgets Who You Are

Voice AI Fundamentals: The 500ms Threshold

Browser Automation: Building Agents That See and Click

LLM Basics: How Machines Think (and Don't)

Share this post

Share this post

Continue Reading

Domain-Specific Voice Flows: Building the Guardrails

Multi-Agent Voice Systems: The Warm Transfer

Voice Conversation Memory: Why Your Bot Forgets Who You Are

Voice AI Fundamentals: The 500ms Threshold

Browser Automation: Building Agents That See and Click

Weekly Bytes of AI