Week 03 — LLM Engineering

Goal: Stop treating LLMs as magic boxes. Learn to talk to them precisely, structure their outputs, control costs, and trace every call — so you ship reliable AI features.

Week 3 bridges the gap between "I can call the API" and "I'm building a production AI system". Every skill here gets used again in RAG (Week 4), Agents (Week 5), and Fine-tuning (Week 8).

The Week 3 Stack

code

User Query
    │
    ▼
Prompt Engineering ──→ Context Engineering ──→ Prompt Cache (save 90%)
                                                      │
                              LLM (Claude / GPT / Gemini / Ollama)
                                │
                    ┌───────────┼────────────┐
                    ▼           ▼            ▼
              Structured   Multimodal    Embeddings
               Output       (img/PDF)   + Similarity
              (Instructor)               Search
                    │
                    └──→ LangSmith traces + LiteLLM cost tracking

Topics This Week

#	Topic	Core Skill
01	Prompt Engineering	Zero-shot, few-shot, CoT, ToT, Self-Consistency
02	Context Engineering	AGENTS.md, CLAUDE.md, system prompt design
03	LLM CLI Tools	`llm` CLI, `aichat`, shell pipelines
04	AI Coding Assistants	Claude Code, Cursor, Copilot, Gemini CLI
05	Pydantic & Structured Output	Instructor, JSON mode, auto-retries
06	Multimodal Inputs	Images, PDFs, audio via API
07	Vector Embeddings	BGE-M3, cosine similarity, dimensionality
08	Similarity Search	HNSW, IVF, FAISS, hnswlib
09	Prompt Caching	Anthropic/OpenAI cache, up to 90% cost savings
10	LangSmith & LiteLLM	Tracing, cost analytics, AI gateway
11	LLM Architecture Survey	Transformer, MoE, CLIP, SoTA 2025 models

Labs This Week

Lab	Title	Time
Lab 3.1	YouTube → Subtitles → Topics → Timestamps JSON pipeline	~3 hrs
Lab 3.2	Cost-Tracking Dashboard via LangSmith + prompt benchmarks	~2 hrs

Project 1 — Due End of This Week

Project 1 covers all of Weeks 1–3: development tooling, deployment, and LLM engineering. See the Projects section when the specification is released.

Mental Model: Four Layers

code

┌──────────────────────────────────────┐
│  Layer 4: OBSERVABILITY              │  LangSmith traces · LiteLLM spend
├──────────────────────────────────────┤
│  Layer 3: OUTPUT SHAPING             │  Instructor · Pydantic · Multimodal
├──────────────────────────────────────┤
│  Layer 2: PROMPT & CONTEXT           │  Engineering · Caching · CLI tools
├──────────────────────────────────────┤
│  Layer 1: MODEL & EMBEDDINGS         │  LLMs · BGE-M3 · FAISS · Architecture
└──────────────────────────────────────┘

Build bottom-up. Get reliable outputs first, then optimize cost, then observe.

The Week 3 Stack​

Topics This Week​

Labs This Week​

Mental Model: Four Layers​

The Week 3 Stack

Topics This Week

Labs This Week

Mental Model: Four Layers