Week 03 — LLM Engineering
Goal: Stop treating LLMs as magic boxes. Learn to talk to them precisely, structure their outputs, control costs, and trace every call — so you ship reliable AI features.
Week 3 bridges the gap between "I can call the API" and "I'm building a production AI system". Every skill here gets used again in RAG (Week 4), Agents (Week 5), and Fine-tuning (Week 8).
The Week 3 Stack
code
User Query
│
▼
Prompt Engineering ──→ Context Engineering ──→ Prompt Cache (save 90%)
│
LLM (Claude / GPT / Gemini / Ollama)
│
┌───────────┼────────────┐
▼ ▼ ▼
Structured Multimodal Embeddings
Output (img/PDF) + Similarity
(Instructor) Search
│
└──→ LangSmith traces + LiteLLM cost tracking
Topics This Week
| # | Topic | Core Skill |
|---|---|---|
| 01 | Prompt Engineering | Zero-shot, few-shot, CoT, ToT, Self-Consistency |
| 02 | Context Engineering | AGENTS.md, CLAUDE.md, system prompt design |
| 03 | LLM CLI Tools | llm CLI, aichat, shell pipelines |
| 04 | AI Coding Assistants | Claude Code, Cursor, Copilot, Gemini CLI |
| 05 | Pydantic & Structured Output | Instructor, JSON mode, auto-retries |
| 06 | Multimodal Inputs | Images, PDFs, audio via API |
| 07 | Vector Embeddings | BGE-M3, cosine similarity, dimensionality |
| 08 | Similarity Search | HNSW, IVF, FAISS, hnswlib |
| 09 | Prompt Caching | Anthropic/OpenAI cache, up to 90% cost savings |
| 10 | LangSmith & LiteLLM | Tracing, cost analytics, AI gateway |
| 11 | LLM Architecture Survey | Transformer, MoE, CLIP, SoTA 2025 models |
Labs This Week
| Lab | Title | Time |
|---|---|---|
| Lab 3.1 | YouTube → Subtitles → Topics → Timestamps JSON pipeline | ~3 hrs |
| Lab 3.2 | Cost-Tracking Dashboard via LangSmith + prompt benchmarks | ~2 hrs |
Project 1 — Due End of This Week
Project 1 covers all of Weeks 1–3: development tooling, deployment, and LLM engineering. See the Projects section when the specification is released.
Mental Model: Four Layers
code
┌──────────────────────────────────────┐
│ Layer 4: OBSERVABILITY │ LangSmith traces · LiteLLM spend
├──────────────────────────────────────┤
│ Layer 3: OUTPUT SHAPING │ Instructor · Pydantic · Multimodal
├──────────────────────────────────────┤
│ Layer 2: PROMPT & CONTEXT │ Engineering · Caching · CLI tools
├──────────────────────────────────────┤
│ Layer 1: MODEL & EMBEDDINGS │ LLMs · BGE-M3 · FAISS · Architecture
└──────────────────────────────────────┘
Build bottom-up. Get reliable outputs first, then optimize cost, then observe.