01 — Mechanics
How AI generates an answer
When you type a prompt and hit send, here's what actually happens, step by step, with no magic involved.
1
Tokenization
Your text gets broken into tokens. "How does photosynthesis work?" becomes roughly ["How", " does", " photos", "ynthesis", " work", "?"], which is six tokens.
2
Context assembly
The system prompt (instructions the AI received before you even typed anything), your conversation history, and your new message are combined into one big sequence of tokens. This entire sequence is what the model "reads."
3
Prediction loop
The model takes your input and predicts the most likely next token. Then it takes that token plus everything before it, and predicts the next one. Token by token, it builds a response.
4
Stopping
The model keeps generating until it predicts a "stop token", a signal that the response is complete, or until it hits a maximum length limit.
This is why AI responses stream in. You're literally watching the model predict one token at a time, building the sentence word by word in front of you rather than composing the whole answer silently and then dumping it out.
The critical insight: the model doesn't plan ahead. It doesn't outline its answer first, then write it. Each token is a local decision based on statistical patterns. This is why AI sometimes starts a sentence confidently and then contradicts itself halfway through, because it never "knew" where the sentence was going. Note: some newer reasoning models (like o1 and o3) use techniques that involve a form of planning before generating the final answer, so this "no planning" description is a general rule for standard LLMs, not a universal law.
02 — The 80/20
Why your prompt determines everything
Same model. Same capabilities. Different prompt = wildly different result.
Here's why: an LLM is a pattern-matching machine. Your prompt defines the pattern it's trying to match. Vague prompt → vague pattern → generic output. Specific prompt → specific pattern → targeted output.
Example: same model, different prompts.
✗ Prompt A
"Tell me about dogs"
→ Generic encyclopedia entry about dogs. Boring.
✓ Prompt B
"You're a veterinarian. A first-time dog owner just adopted a 3-month-old golden retriever. Write a checklist of the first 30 days: vaccinations, training basics, what to buy, red flags to watch for."
→ Detailed, structured, practical checklist. Useful.
The model didn't get smarter between prompt A and prompt B. You gave it a better pattern to match.
This is the single most important concept in working with AI: the quality of your output is 80% determined by the quality of your input. The model is the constant. Your prompt is the variable. Most people who say "AI is useless" are giving it Prompt A and expecting Prompt B results.
03 — The knobs
Temperature & settings — the knobs you can turn
When you use an AI API (or some advanced interfaces), you'll see settings like "temperature" and "top-p." These control how the model picks its next token.
Temperature controls randomness. Drag the knob below and watch the behavior shift:
Think of it like this: low temperature = a cautious accountant who always picks the safest answer. High temperature = an improvisational comedian who takes creative risks.
When to use what:
| Task | Temperature | Top-p |
| Factual Q&A | 0 – 0.3 | 0.1 – 0.3 |
| Code generation | 0 – 0.2 | 0.1 – 0.2 |
| Structured output (JSON, tables) | 0 | 0.1 |
| Creative writing | 0.7 – 1.0 | 0.8 – 1.0 |
| Brainstorming | 0.8 – 1.0 | 0.9 – 1.0 |
Most web interfaces (ChatGPT, Claude) don't expose these settings; they pick defaults for you. But when you use APIs or advanced tools, these knobs give you real control over output quality.
⚡ Try this now
Type into ChatGPT twice, in two separate chats:
Describe a cat in one sentence.
Compare the two results. Same, or different? Now add Temperature: 0. to one and try again. That tiny difference is randomness, on purpose.
04 — Memory
Context window — why AI "forgets"
Every model has a context window, the maximum number of tokens it can process at once. Think of it as the model's working memory.
What fits in the context window:
- System prompt (the hidden instructions)
- Your entire conversation history
- Your current message
- The model's response
What happens when it's full: different systems handle this differently.
- Truncation: The oldest messages get dropped. The model literally forgets the beginning of your conversation.
- Summarization: Some systems summarize older messages to save space. You lose detail.
- Sliding window: Only the last N messages are kept. Earlier context disappears.
Context window sizes (2026):
| Model | Context Window | Approximate |
| GPT-4o | 128K tokens | ~100,000 words |
| Claude Sonnet 4 | 200K tokens | ~150,000 words |
| Gemini | up to 2M tokens | ~1,500,000 words |
Sounds like a lot? It fills up faster than you think. A long conversation with code examples, file contents, and detailed responses can hit 128K tokens in under an hour of active use.
Practical tips:
- Start new conversations for new topics (don't carry dead context)
- Ask for concise responses to save tokens
- If the AI seems to "forget," the context window is usually the culprit
- For long projects, save important context in external files and re-inject when needed
⚡ Try this now
Start a fresh chat and type:
Remember these 5 things: (1) my name is Alex, (2) I live in Berlin, (3) I work at a startup, (4) I like coffee, (5) I hate meetings.
Then keep chatting for 20+ messages about anything. At the end, ask "What's my name?" If it can't answer, the context window filled up and the start got dropped.
05 — Hidden layer
System prompt — the secret instructions
Before you type a single word, the AI has already received instructions. These are called system prompts, a set of rules and context that shapes how the model behaves throughout the conversation.
What's in a system prompt:
- The model's persona ("You are a helpful assistant...")
- Behavioral rules ("Be concise. Don't use jargon.")
- Context about the user ("The user is a developer...")
- Formatting instructions ("Respond in markdown...")
- Safety guidelines ("Don't generate harmful content...")
You never see the system prompt in most interfaces. But it's there, influencing every response.
The system prompt explains why the same model can feel completely different in different products. ChatGPT is more "chatty" (its system prompt encourages verbosity). Claude is more "careful" (its system prompt emphasizes safety). A custom AI coding assistant gives better code (its system prompt is optimized for code).
How to use this knowledge: even if you can't edit the system prompt directly, you can simulate its effects.
- Set a persona in your first message: "You are a senior data analyst. Be direct and technical."
- Define constraints: "Respond in under 200 words. Use bullet points. No fluff."
- Provide context: "I'm a beginner learning Python. Explain things simply."
- Set the format: "Output as a JSON object with keys: summary, risks, next_steps."
This is the beginning of prompt engineering, and it's the most powerful skill you can develop for working with AI.
06 — Safety
What you should never paste into AI
AI is not your therapist. It's not your diary. It's a service run by a company, and what you type can be read, logged, or used to improve their models (depending on the settings you agreed to without reading).
Never paste these into an AI:
- Passwords or API keys. There's a real documented case of someone pasting an API key into ChatGPT, the key leaking, and the bill hitting roughly $10,000 overnight. Once it's in the prompt, treat it as exposed.
- Other people's personal data. Names, addresses, phone numbers, photos without consent. This isn't only an ethics question. It can break privacy laws (GDPR in Europe, CCPA in California, UU PDP in Indonesia).
- Confidential company documents. NDAs, contracts, business strategy. Your data may land in training data depending on settings, or get exposed in a breach.
- Sensitive health or financial information. Doctor diagnoses, bank account numbers, tax IDs. The AI doesn't need any of this to help you.
Rule of thumb: if you wouldn't want this information to become public, don't paste it into an AI.
What's safe:
- General questions ("explain how an API works")
- Code with credentials stripped out
- Homework and assignments
- Brainstorming ideas
- Editing and proofreading text
One more thing: if you use AI at work, check your company's policy first. Many companies now block AI tools entirely over data-leakage risk. Don't be the person who pastes a customer database into a chatbot.
07 — Watch out
What can go wrong at this level
Now that you see the mechanics, here are the four mistakes people make once they start paying attention to them.
1. Not knowing how to read the output
You read an AI response like an article, top to bottom. But the AI doesn't write the way a human does. It writes from probability, one token at a time. The first part may be correct, the middle may ramble, and the end may contradict the opening.
How to avoid: Read with skepticism. Highlight specific claims as you go. Ask yourself for each: "Is this a fact or an opinion?"
2. Not understanding why output changes every time
You ask the same question twice and get two different answers. This isn't a bug. It's the temperature knob doing its job. Temperature 0 is consistent. Temperature 1 is creative but unpredictable.
How to avoid: When you need a consistent answer, tell the AI "answer with certainty, don't speculate." Just remember: consistent is not the same as correct.
3. Hitting the context window limit without knowing
You have a long chat. Suddenly the AI forgets what you said at the start. That's the context window. The model can only "see" a fixed number of tokens at once, and once the chat overflows, the oldest messages fall off the edge.
How to avoid: When a chat gets long, start a new one, or summarize the key points at the top of a fresh message.
4. Picking the wrong model for the task
You use ChatGPT for coding and the results are weak. You use Claude for creative writing and it comes out stiff. Each model has different strengths baked in.
How to avoid: ChatGPT is strong for general tasks and plugins. Claude is strong for analysis and code. Gemini is strong for multimodal. Try the same task on two of them and feel the difference.
06 — What you now know
What you should know after Level 1
You now understand the mechanics. Tap each as it clicks:
You're no longer just using AI. You're starting to understand the mechanics behind it. The difference between someone who types a question and hopes for the best, and someone who deliberately engineers their interaction for better results.