Skip to main content

Week 4 — RAG & Hybrid RAG

"Your LLM is only as good as what you put in its context."
— This week you take control of that context.


What You'll Build This Week

By the end of Week 4, you will have built a production-grade RAG pipeline that can answer questions over any document corpus. You'll go from "stuff everything in the prompt" to a proper two-stage retrieve-then-generate system with evaluation metrics to prove it works.


Topics at a Glance

#TopicWhat You Learn
01Chunking StrategiesHow to split documents intelligently
02Vector DatabasesFAISS, ChromaDB, Qdrant, PGVector — when to use what
03Hybrid SearchDense + Sparse retrieval, BM25, RRF fusion
04Query AugmentationHyDE, query rewriting, step-back prompting
05RerankingCross-encoders, Cohere Rerank, ColBERT
06RAGAS EvaluationMeasuring your RAG with faithfulness, precision, recall
07Multimodal EmbeddingsColPali, CLIP, embedding images + text together
08LLM GroundingSource attribution, citations, hallucination control
09Late ChunkingJina AI's token-level pooling trick
10GraphRAGMicrosoft's entity graph approach
11Contextual RetrievalAnthropic's chunk-level context injection
12Semantic CachingAvoid redundant LLM calls with cosine-threshold caching

Labs & Capstones

TypeNameKey Tech
CAPSTONEBS Degree ChatbotHybrid RAG · RAGAS · FastAPI
CAPSTONEPolicy ChatbotNeMo Guardrails · Google Auth
LABRAGAS Evaluation DashboardNaive RAG vs Hybrid RAG vs Contextual

Why RAG?

LLMs have a fixed knowledge cutoff and a finite context window. RAG solves both problems:

  • Knowledge cutoff → pull fresh documents at query time
  • Hallucination → ground answers in retrieved evidence
  • Context limits → retrieve only what's relevant (not the full corpus)
  • Cost → cheaper than fine-tuning for every new document set

The naive approach (paste all documents into prompt) breaks at scale. This week you build the proper pipeline.


Prerequisites

Make sure you've done Week 3. You'll need:

  • Basic LLM API calls (OpenAI / Anthropic)
  • Understanding of embeddings and cosine similarity (Week 3 → Vector Embeddings)
  • Python + FastAPI basics