GraphRAG
GraphRAG adds structure to RAG by building a knowledge graph (entities + relations) alongside your vector store. It helps most when questions require multi-hop reasoning, like:
- “Which components depend on X, and what breaks if we change it?”
- “What’s the relationship between A and B across multiple docs?”
Learning goals
- Understand the GraphRAG pipeline (extract → store → expand → retrieve → synthesize)
- Build a toy graph from documents
- Use graph expansion to guide retrieval
The GraphRAG pipeline
- Entity extraction: find entities (people, concepts, APIs)
- Relation extraction: connect them (A uses B, A depends on B)
- Graph store: nodes/edges + attributes
- Query-time:
- extract entities from the question
- expand neighborhood (k hops)
- retrieve text chunks near those entities
- Synthesis: answer grounded in retrieved text + graph structure
Toy example (NetworkX)
python
import networkx as nx
G = nx.Graph()
G.add_edge("RAGAS", "RAG Evaluation", relation="measures")
G.add_edge("Hybrid Search", "BM25", relation="uses")
G.add_edge("Hybrid Search", "Embeddings", relation="uses")
def expand(seed: str, hops: int = 1) -> set[str]:
frontier = {seed}
seen = {seed}
for _ in range(hops):
nxt = set()
for n in frontier:
nxt |= set(G.neighbors(n))
nxt -= seen
seen |= nxt
frontier = nxt
return seen
print(expand("Hybrid Search", hops=1))
How it connects to your vector store
A practical pattern:
- store text chunks in a vector DB
- store entity mentions as metadata on each chunk
- graph expansion picks which entities (and therefore which chunks) to fetch
When GraphRAG is a bad idea
- If your questions are mostly “find the paragraph” (classic RAG is enough)
- If entity extraction is noisy (graph becomes garbage-in-garbage-out)
- If latency budget is tight (graph expansion adds steps)
Mini-lab (optional)
Build a “course knowledge graph”:
- entities: week topics, tools, libraries
- relations: “uses”, “extends”, “evaluated-by”
- query: multi-hop questions and compare to hybrid search