GraphRAG
Build an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over th...
Intent & Description
🎯 Intent
Build an LLM-extracted entity-and-relation knowledge graph plus hierarchical community summaries, then answer global queries via map-reduce over those summaries.
📋 Context
A team is using a retrieval-augmented system over a large corpus and starts receiving questions about the corpus as a whole rather than individual facts in it: ‘what are the main themes in these reports?’, ‘how does this position evolve across the documents?’, ‘which entities are central to the discussion?’ These are corpus-level sensemaking queries, not local lookup queries, and they arrive alongside the easier fact-style questions.
💡 Solution
Index time: extract entities and relations from chunks; build a knowledge graph; cluster into hierarchical communities; summarise each community. Query time: classify query as local (entity-specific) or global (corpus-wide). Local queries use entity-anchored retrieval; global queries map-reduce over community summaries.
Real-world Use Case
- Users ask global, corpus-wide questions that local chunk retrieval cannot answer.
- The corpus has clear entities and relations worth extracting into a graph.
- Index-time cost can be paid up front to enable hierarchical community summaries.
Source
Advantages
- Answers corpus-level sensemaking questions naive RAG cannot.
- Communities are inspectable artefacts of the corpus.
Disadvantages
- High indexing cost (orders of magnitude more LLM calls).
- Entity extraction errors cascade through the graph.