Skip to main content

Contextual Retrieval

Contextual retrieval is a simple upgrade to classic RAG: make each chunk self-contained by attaching the minimum context needed to interpret it.

Instead of embedding just a paragraph, you embed:

  • document title
  • section path (H1 → H2 → H3)
  • optional short summary / metadata
  • the chunk text

This often fixes the most common RAG failure mode: the retrieved chunk is “about the thing”, but you can’t tell what the thing is.

Learning goals

  • Build heading-aware chunks (Markdown / HTML / PDF)
  • Create a compact “context string” per chunk
  • Measure improvements with RAGAS

A minimal contextual chunk format

text
Doc: Week 4 — Retrieval-Augmented Generation
Section: Hybrid Search > Reciprocal Rank Fusion
Source: week-4/hybrid-search

<chunk text here>
Keep the context short

Context should be identifying, not a second document. Too much context increases token cost and can worsen retrieval.

Implementation sketch (Markdown)

  1. Parse a Markdown doc into sections.
  2. For each section, split into chunks.
  3. For each chunk, build context + chunk and embed that.

Pseudo-code:

python
def make_context(doc_title: str, heading_path: list[str], source: str) -> str:
section = " > ".join(heading_path) if heading_path else "(root)"
return f"Doc: {doc_title}\nSection: {section}\nSource: {source}\n\n"

def chunk_for_embedding(context: str, chunk_text: str) -> str:
return context + chunk_text.strip()

What to store as metadata

Even if you embed context + text, also store structured metadata:

  • doc_id, source_path, heading_path, page_number (PDF), created_at

Metadata powers filtering (e.g., “only Week 4”) and better citations.

Evaluation (what “better” means)

Use the same question set and compare:

  • context recall / precision (retrieval quality)
  • faithfulness (generation groundedness)

See: RAG Evaluation.

Common pitfalls

  • Context leakage: if your context includes private info, you’re leaking it into embeddings.
  • Token bloat: long summaries can hurt cost and latency.
  • Overfitting headings: headings can dominate embeddings; keep them concise.

Mini-lab (optional)

Upgrade your Lab 2 RAG pipeline:

  • add heading-aware chunking for Markdown docs
  • embed contextual chunks
  • report RAGAS deltas vs baseline