Level 3 · Builder — Neural.Literacy

01 — Three doors

API vs Web Interface vs CLI

There are three ways to interact with AI. Most people only know one. Builders use all three.

🌐 Web Interface

ChatGPT.com, Claude.ai, Gemini.app. Lowest friction. Best for quick questions, exploration, learning. No automation, no integration, no settings control. You're a user on someone else's platform.

🔌 API

Send requests directly to the model via code. Full control: model, temperature, system prompt, tools, output format. Can be automated, integrated, scaled. Pay per token. You're a builder using raw materials.

⌨️ CLI

Terminal-based tools that wrap an API. Best for developers who live in the terminal. Often more powerful than web UI (scripting, piping). Examples: Claude Code, Hermes Agent, custom scripts.

When to use what:

Situation	Use
Quick one-off question	Web UI
Learning, exploring	Web UI
Building an app	API
Automating workflows	API or CLI
Daily driving AI for work	CLI with tools
Complex multi-step projects	CLI with agent capabilities

The progression most builders follow: start with web UI → discover API → build tools → adopt CLI agents. Each step gives you more power and more responsibility.

02 — Under the hood

Inference — what happens when you hit "send"

Inference is the technical term for "running the model", taking your input and generating an output. Understanding it helps you understand why AI is sometimes fast, sometimes slow, sometimes smart, sometimes dumb.

When you send a prompt, a server somewhere:

1

Loads the model

Billions of parameters loaded into GPU memory.

2

Processes input

Your input tokens pass through the model's layers.

3

Generates output

Output tokens produced one at a time.

4

Returns the result

The response comes back to you.

Why inference speed varies:

Model size: Larger models (400B+ params) are slower than smaller ones (7B). More parameters = more computation per token.
Hardware: NVIDIA H100 GPUs are faster than A100s, which are faster than consumer GPUs.
Quantization: Compressed models run faster but with lower quality (more below).
Load: When millions use ChatGPT at once, everyone gets slower responses.

The dirty secret of cloud AI: you're not always getting the same model quality. Providers may use quantized versions during peak hours, route to a less powerful fallback, or reduce active parameters. You have no visibility into this. Same API, same price, different quality.

03 — The map

Provider landscape

Closed-source providers (API access only):

Provider	Models	Strengths	Weaknesses
OpenAI	GPT-4o, o1, o3	Largest ecosystem, strong general	Expensive at frontier, closed
Anthropic	Claude (Opus, Sonnet, Haiku)	Safety, long context, careful reasoning	More cautious, smaller ecosystem
Google	Gemini	Multimodal, huge context, Google integration	Inconsistent, sometimes generic

Open-source providers (you can self-host):

Provider	Notable Models	Notes
Meta	Llama 4	Best open-source foundation models
Mistral	Mistral, Mixtral	Strong European alternative
DeepSeek	DeepSeek V3	Chinese, competitive quality, very cheap
Qwen	Qwen 3	Alibaba, strong multilingual

Inference providers (host models for you):

Provider	What they do	Why use them
OpenRouter	Route to many models	One API, many models, price comparison
Together AI	Fast open-source inference	Cheap, fast, good selection
Fireworks AI	Fast inference	Speed-optimized
Groq	Ultra-fast inference (custom chip)	Fastest available, limited models

The economics: frontier model pricing (per 1M tokens, June 2026).

Claude Opus: ~$15 input / $75 output
GPT-4o: ~$2.50 input / $10 output
Claude Sonnet: ~$3 input / $15 output
DeepSeek V3: ~$0.27 input / $1.10 output
Llama 4 (via Together): ~$0.90 input / $0.90 output

Output tokens are always more expensive than input tokens. Asking for concise output literally saves money.

⚡ Try this now

Open openrouter.ai and look at the leaderboard. What's the cheapest model right now? The most expensive? The most popular? Five minutes here and you'll understand the landscape better than most people who use AI daily.

04 — Compression

Quantization — compressed models

Quantization reduces the precision of a model's parameters to make it smaller, faster, and cheaper to run, at the cost of some quality.

The analogy is a photograph. Original: 4000×3000 pixels, 12MB, full quality. Compressed: 1000×750 pixels, 2MB, smaller and faster to load but you lose fine detail. Quantization does the same thing to model weights.

Common quantization levels:

Level	Precision	Size Reduction	Quality Impact
FP16 (original)	16-bit	1x (baseline)	Full quality
FP8	8-bit	~2x smaller	Minimal loss
INT8	8-bit	~2x smaller	Small loss
INT4	4-bit	~4x smaller	Noticeable loss
INT2	2-bit	~8x smaller	Significant loss

✓ When quantization is fine

Simple tasks (summarization, formatting, classification)
Bulk processing where speed > peak quality
Running on limited hardware (consumer GPUs, laptops)

✗ When to avoid it

Complex reasoning tasks
Nuanced creative work
Tasks where accuracy is critical

05 — The fork

Open-source vs closed-source

One of the most important decisions in AI: do you use a proprietary model via API, or an open-source model you can host yourself?

Closed-source

Pros: Best quality, zero maintenance, always up to date.

Cons: Data goes to third party, can't customize, vendor lock-in, costs can scale unpredictably.

Open-source

Pros: Full control, data stays private, can fine-tune, no vendor lock-in, cheaper at scale.

Cons: Need infrastructure, quality gap with frontier models, maintenance burden.

The trend: the gap is closing fast. DeepSeek V3 and Qwen 3 are competitive with GPT-4o on many tasks. In 2026, the choice isn't "open-source is worse." It's "open-source requires more work but gives you more control."

06 — Build it

Build your first AI tool

You don't need to be a senior developer to build something useful with AI. You need to understand the loop.

Input→ Format prompt→ Send to API→ Get response→ Use output

Every AI tool, from a simple chatbot to a complex agent, follows this loop. The complexity comes from what you add around it.

1

Simple wrapper

Takes input, sends to API with a system prompt, displays response. Example: a customer support chatbot.

2

With context

Add: search your database for relevant info, combine input + context in the prompt. Example: an AI that answers questions about your docs.

3

With tools

Add: API requests tool calls (search, calculate), system executes them, feeds results back. Example: an AI assistant that can actually DO things.

4

With memory

Add: saves important facts to persistent storage, loads relevant memory each session. Example: a personal AI that knows your preferences and history.

You don't need to build the next ChatGPT. You need to build the tool that makes YOUR specific workflow 10x better.

⚡ Try this now

If you have an API key from OpenAI, Anthropic, or OpenRouter, run this in your terminal:
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"Hello"}]}'
You just called an AI model from the command line. That curl is every API tool you'll ever build, stripped to its core.

07 — Watch out

What can go wrong at this level

Once you start building, the mistakes shift from "wrong answer" to "broken system." Here are the four that bite builders most often.

1. Picking a provider on hype

You hear "Claude is better than GPT" and switch immediately. It's not that simple. Every provider trades something: speed against quality, price against context window, privacy against features.

How to avoid: Test it yourself. Send the same task to three providers and compare. What's best for someone else may be wrong for your workload.

2. Not tracking costs

You route everything through GPT-4o, including summarization tasks a tiny model could handle. The bill runs 10x higher than it needs to. Worse, you don't even know how many tokens each request burns.

How to avoid: Monitor usage. Use cheap models for simple work (translate, summarize, format) and reserve expensive models for hard reasoning (code review, analysis).

3. Not handling errors

You build an app that calls an AI API. You don't handle rate limits, timeouts, malformed responses, or network failures. The first hiccup and your app crashes for the user.

How to avoid: Always handle the failure paths: try/catch, retry with exponential backoff, timeouts, and a fallback response. AI APIs are unreliable by nature. Plan for it.

4. Hardcoding API keys

You paste an API key straight into code and push it to GitHub. Within seconds, automated bots scan it. Within minutes, someone is burning your key to generate content. Bills can climb into the thousands before you notice.

How to avoid: Always load keys from environment variables. Add .env to your .gitignore. If a key ever leaks, revoke it immediately from the provider dashboard.

08 — What you now know

What you should know after Level 3

You now understand the builder's perspective. Tap each as it clicks:

Three ways to interact with AI (web, API, CLI) and when to use each How inference works and why quality varies The provider landscape and economics What quantization is and why it affects you Open-source vs closed-source tradeoffs The fundamental AI tool loop

You have the knowledge to start building. The tools are accessible. The APIs are well-documented. The models are capable. The only thing left is to actually build something.

From userto builder.

API vs Web Interface vs CLI

Inference — what happens when you hit "send"

Loads the model

Processes input

Generates output

Returns the result

Provider landscape

Quantization — compressed models

Open-source vs closed-source

Build your first AI tool

Simple wrapper

With context

With tools

With memory

What can go wrong at this level

What you should know after Level 3

From user
to builder.