Contextual Retrieval
Contextual retrieval is a simple upgrade to classic RAG: make each chunk self-contained by attaching the minimum context needed to interpret it.
Instead of embedding just a paragraph, you embed:
- document title
- section path (H1 → H2 → H3)
- optional short summary / metadata
- the chunk text
This often fixes the most common RAG failure mode: the retrieved chunk is “about the thing”, but you can’t tell what the thing is.
Learning goals
- Build heading-aware chunks (Markdown / HTML / PDF)
- Create a compact “context string” per chunk
- Measure improvements with RAGAS
A minimal contextual chunk format
text
Doc: Week 4 — Retrieval-Augmented Generation
Section: Hybrid Search > Reciprocal Rank Fusion
Source: week-4/hybrid-search
<chunk text here>
Keep the context short
Context should be identifying, not a second document. Too much context increases token cost and can worsen retrieval.
Implementation sketch (Markdown)
- Parse a Markdown doc into sections.
- For each section, split into chunks.
- For each chunk, build
context + chunkand embed that.
Pseudo-code:
python
def make_context(doc_title: str, heading_path: list[str], source: str) -> str:
section = " > ".join(heading_path) if heading_path else "(root)"
return f"Doc: {doc_title}\nSection: {section}\nSource: {source}\n\n"
def chunk_for_embedding(context: str, chunk_text: str) -> str:
return context + chunk_text.strip()
What to store as metadata
Even if you embed context + text, also store structured metadata:
doc_id,source_path,heading_path,page_number(PDF),created_at
Metadata powers filtering (e.g., “only Week 4”) and better citations.
Evaluation (what “better” means)
Use the same question set and compare:
- context recall / precision (retrieval quality)
- faithfulness (generation groundedness)
See: RAG Evaluation.
Common pitfalls
- Context leakage: if your context includes private info, you’re leaking it into embeddings.
- Token bloat: long summaries can hurt cost and latency.
- Overfitting headings: headings can dominate embeddings; keep them concise.
Mini-lab (optional)
Upgrade your Lab 2 RAG pipeline:
- add heading-aware chunking for Markdown docs
- embed contextual chunks
- report RAGAS deltas vs baseline