This is a viewer only at the moment see the article on how this works.
To update the preview hit Ctrl-Alt-R (or ⌘-Alt-R on Mac) or Enter to refresh. The Save icon lets you save the markdown file to disk
This is a preview from the server running through my markdig pipeline
Friday, 26 December 2025
Your RAG system is great at "needle" questions: retrieve a few relevant chunks and synthesise an answer. It struggles with two common query types:
Those aren't answered by any single chunk. They require coverage + clustering + linkage.
Why vector search fails here:
You can brute-force this with prompting and post-processing, but you end up rebuilding a graph-shaped solution.
The key insight: GraphRAG changes the retrieval unit. For corpus questions you don't want "top-K similar chunks"; you want connected concept communities (and their summaries), so the model sees structure, not fragments.
GraphRAG comes from Microsoft Research's paper and is available as an open-source implementation. It keeps vector search for specific questions, but adds a knowledge graph and community summaries for corpus-level reasoning.
GraphRAG Series Navigation:
Before diving in, let's be clear about when this is overkill:
If your users only ask specific questions, stick with semantic search. GraphRAG shines when users need the big picture, and that's a smaller audience than vendors suggest.
Series Navigation: This is Part 6 of the RAG series:
Throughout this series, we've built increasingly sophisticated RAG systems. We started with basic vector search, added hybrid keyword+semantic retrieval, and integrated auto-indexing. But all these approaches share a fundamental limitation: they find similar chunks, not connected concepts. For corpus-level questions (themes spanning many documents) you need structure.
The recommended path: If you already have Qdrant-based local search working (like we do), prototype with the Python sidecar to validate value, keep vectors for local search, and add a lightweight graph for global/DRIFT queries. Only go "full GraphRAG" once you've proven users ask those questions.
Let me show you what I mean with a concrete example.
Question: "How do I use HTMX with Alpine.js?"
Vector RAG process:
[0.234, -0.891, 0.567, ...]This works because the question and the relevant content are semantically similar. The embeddings capture that similarity.
// This is what our current SemanticSearchService does
var embedding = await _embeddingService.GetEmbeddingAsync(query);
var results = await _qdrantService.SearchAsync(
collectionName: "blog_posts",
queryVector: embedding,
limit: 10
);
// Returns chunks about HTMX, Alpine.js, frontend patterns
Question: "What are the main technologies I write about and how do they relate to each other?"
What vector RAG returns:
Result 1: "HTMX makes it easy to add AJAX to your pages..."
Result 2: "Docker Compose orchestrates multiple containers..."
Result 3: "PostgreSQL's full-text search is surprisingly capable..."
Result 4: "Alpine.js provides reactive state management..."
It mentions Docker, PostgreSQL, HTMX, ONNX... but doesn't group them or explain how they connect. You get fragments, not insight.
The problem: This question requires aggregation and relationship understanding across the entire corpus. You need to:
Vector similarity alone doesn't give you this. If you try to patch this with prompting, you end up reinventing a graph.
GraphRAG is Microsoft Research's solution to this problem. The GraphRAG paper identified the two query types that baseline RAG handles poorly (sensemaking and connective) and built a system specifically to address them.
Instead of just embedding chunks, GraphRAG builds a knowledge graph that captures entities and their relationships, then clusters them into communities with summaries.
Pipeline at a glance:
GraphRAG adds several components to the RAG pipeline, grouped into three categories:
flowchart TB
subgraph "Traditional RAG (What We Have)"
A[Documents] --> B[Chunks]
B --> C[Embeddings]
C --> D[Vector Store]
end
subgraph "GraphRAG Additions"
B --> E[Entity Extraction]
E --> F[Relationship Extraction]
F --> G[Knowledge Graph]
G --> H[Community Detection]
H --> I[Community Summaries]
end
subgraph "Query Time"
J[User Query] --> K{Query Type?}
K -->|Specific| L[Local Search]
K -->|Global| M[Global Search]
K -->|Hybrid| N[DRIFT Search]
D --> L
G --> L
I --> M
G --> N
I --> N
end
style E stroke:#f9f,stroke-width:2px
style H stroke:#bbf,stroke-width:2px
style I stroke:#9f9,stroke-width:2px
An LLM reads each chunk and extracts entities (the things being discussed):
Chunk: "Docker Compose makes it easy to define multi-container applications.
I use it with PostgreSQL for my blog's database layer."
Extracted Entities:
- Docker Compose (technology)
- PostgreSQL (database)
- blog (project)
- database layer (concept)
The same LLM identifies how entities relate to each other:
Relationships:
- Docker Compose --[used_with]--> PostgreSQL
- blog --[has_component]--> database layer
- PostgreSQL --[implements]--> database layer
All entities and relationships form a graph:
graph LR
subgraph "Frontend Cluster"
HTMX[HTMX]
Alpine[Alpine.js]
Tailwind[Tailwind CSS]
end
subgraph "Infrastructure Cluster"
Docker[Docker]
Compose[Docker Compose]
Postgres[PostgreSQL]
Qdrant[Qdrant]
end
subgraph "AI/ML Cluster"
ONNX[ONNX Runtime]
Embeddings[Embeddings]
RAG[RAG]
end
HTMX -->|used_with| Alpine
HTMX -->|styled_by| Tailwind
Alpine -->|styled_by| Tailwind
Docker -->|orchestrated_by| Compose
Compose -->|runs| Postgres
Compose -->|runs| Qdrant
ONNX -->|generates| Embeddings
Embeddings -->|stored_in| Qdrant
RAG -->|uses| Embeddings
RAG -->|uses| Qdrant
style HTMX stroke:#f9f
style Docker stroke:#bbf
style RAG stroke:#9f9
The Leiden algorithm clusters densely connected nodes into communities. This matters because it gives you stable clusters to summarise and retrieve; the communities become your retrieval units for global queries.
Notice how Qdrant appears in two communities: it bridges infrastructure and AI/ML.
An LLM generates summaries for each community at each hierarchy level:
Community 1 Summary (Frontend Stack):
"The frontend approach combines HTMX for server-driven interactivity
with Alpine.js for client-side state management, styled using Tailwind CSS.
This stack prioritizes HTML-first development with minimal JavaScript,
focusing on progressive enhancement over SPA complexity."
Community 2 Summary (Container Infrastructure):
"The blog runs on Docker Compose, orchestrating PostgreSQL for persistent
storage, Qdrant for vector search, and the ASP.NET Core application.
This containerized architecture enables consistent local development
and production deployment."
GraphRAG provides three query modes, each optimized for different question types:
Best for: "What are the main themes?" "Summarize the key topics."
Uses community summaries (not individual chunks) to answer sensemaking questions:
Query: "What technologies does this blog cover most?"
Process:
1. Retrieve all community summaries
2. Map: Ask LLM to extract technology themes from each summary
3. Reduce: Combine partial answers into final response
Response:
"The writing centres on three technology clusters:
1. **Frontend Development** - HTMX, Alpine.js, Tailwind CSS for minimal-JS web UIs
2. **AI/ML Infrastructure** - RAG pipelines, ONNX embeddings, vector search with Qdrant
3. **DevOps/Containerization** - Docker, PostgreSQL, ASP.NET Core deployment"
Best for: "How do I configure X?" "What is Y?"
Combines entity-focused graph traversal with traditional vector search:
Query: "How do I use Qdrant with ONNX embeddings?"
Process:
1. Identify entities in query: Qdrant, ONNX, embeddings
2. Retrieve graph neighborhood around those entities
3. Also retrieve vector-similar chunks
4. Combine into rich context for LLM
Response includes:
- Direct relationships (ONNX generates embeddings stored in Qdrant)
- Related entities (all-MiniLM-L6-v2 model, cosine similarity)
- Specific code examples from vector-retrieved chunks
Best for: "How does X relate to Y?" "Compare A and B."
DRIFT search (Dynamic Reasoning and Inference with Flexible Traversal), as described in the GraphRAG docs, combines local search with community context. It's still using LLM reasoning over retrieved structured context (not magic graph inference), but the structure helps the LLM see connections it would miss with flat chunks.
Query: "How do the frontend and backend technologies connect?"
Process:
1. Start with entities: HTMX, ASP.NET Core
2. Traverse graph to find connection paths
3. Include community summaries for context
4. Generate answer showing the full picture
Response:
"HTMX makes requests to ASP.NET Core endpoints, which query PostgreSQL
and Qdrant. The connection flows through the API layer, where endpoints
return HTML fragments that HTMX swaps into the DOM. Alpine.js handles
client-side state for interactive components like search typeahead."
Let's map GraphRAG concepts to what we already have in Mostlylucid.SemanticSearch:
| Component | Current System | GraphRAG Equivalent |
|---|---|---|
| Embeddings | ONNX (all-MiniLM-L6-v2) | Same (or OpenAI) |
| Vector Store | Qdrant | Qdrant / LanceDB |
| Entity Extraction | None | LLM-powered extraction |
| Knowledge Graph | None | Graph database / in-memory |
| Community Detection | None | Leiden algorithm |
| Query: Specific | SemanticSearchService.SearchAsync() |
Local Search |
| Query: Global | Not supported | Global Search |
Our current implementation handles Local Search well. GraphRAG would add Global Search and DRIFT Search capabilities.
// What we have today (Local Search equivalent)
public async Task<List<SearchResult>> SearchAsync(string query, int limit = 10)
{
var embedding = await _embeddingService.GetEmbeddingAsync(query);
return await _qdrantService.SearchAsync("blog_posts", embedding, limit);
}
// What GraphRAG would add
public async Task<string> GlobalSearchAsync(string query)
{
// 1. Retrieve community summaries (not chunks)
var summaries = await _graphService.GetCommunitySummariesAsync();
// 2. Map: Extract relevant themes from each summary
var partialAnswers = await Task.WhenAll(
summaries.Select(s => _llm.ExtractThemesAsync(query, s))
);
// 3. Reduce: Combine into final answer
return await _llm.SynthesizeAsync(query, partialAnswers);
}
There are three ways to add GraphRAG to an existing system.
Run Microsoft's GraphRAG as a separate service:
# docker-compose.graphrag.yml
services:
graphrag:
build:
context: ./graphrag
volumes:
- ./data/input:/app/input
- ./data/output:/app/output
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
graphrag-api:
build:
context: ./graphrag-api
ports:
- "8001:8000"
depends_on:
- graphrag
// GraphRagClient.cs - Call from ASP.NET Core
public class GraphRagClient
{
private readonly HttpClient _http;
public GraphRagClient(HttpClient http)
{
_http = http;
_http.BaseAddress = new Uri("http://graphrag-api:8000");
}
public async Task<string> GlobalSearchAsync(string query)
{
var response = await _http.PostAsJsonAsync("/query/global", new { query });
var result = await response.Content.ReadFromJsonAsync<GraphRagResponse>();
return result.Answer;
}
public async Task<string> LocalSearchAsync(string query)
{
var response = await _http.PostAsJsonAsync("/query/local", new { query });
var result = await response.Content.ReadFromJsonAsync<GraphRagResponse>();
return result.Answer;
}
}
Pros: Use Microsoft's battle-tested implementation, quick to prototype Cons: Python dependency, LLM costs for indexing, cross-process communication
Build the key components in C#. The BERT-based extraction and Ollama patterns from DocSummarizer work similarly here.
Ask an LLM to identify things (entities) in each chunk - structured entities rather than free-form topics:
public async Task<List<Entity>> ExtractEntitiesAsync(string chunk)
{
var prompt = $"""
Extract entities from this text. Return JSON array.
Types: technology, concept, project, person, organization
Text: {chunk}
Format: [{{"name": "Docker", "type": "technology"}}]
""";
var response = await _ollama.GenerateAsync(prompt);
return JsonSerializer.Deserialize<List<Entity>>(response);
}
Production requirement: LLM JSON output will break. This is not optional hardening. You need one of:
format: json, OpenAI's function calling)LLMs are probabilistic; your extraction pipeline must not be.
Once you have entities, ask the LLM how they connect:
public async Task<List<Relationship>> ExtractRelationshipsAsync(
string chunk, List<Entity> entities)
{
var names = string.Join(", ", entities.Select(e => e.Name));
var prompt = $"""
Given entities: {names}
Extract relationships. Return JSON array.
Text: {chunk}
Format: [{{"source": "Docker", "target": "PostgreSQL", "rel": "runs"}}]
""";
return JsonSerializer.Deserialize<List<Relationship>>(
await _ollama.GenerateAsync(prompt));
}
The biggest practical pain is entity aliasing: "ASP.NET Core", "ASP.NET", and "aspnetcore" should be the same node. Simple normalisation helps:
public class KnowledgeGraph
{
private readonly Dictionary<string, Entity> _entities = new();
private readonly List<Relationship> _relationships = new();
public void AddEntity(Entity entity)
{
var key = Normalise(entity.Name); // "ASP.NET Core" → "aspnetcore"
_entities[key] = entity;
}
private string Normalise(string name) =>
name.ToLowerInvariant().Replace(".", "").Replace("-", "").Trim();
}
For serious use, consider embedding-based entity deduplication: if two entity names have similar embeddings, they're probably the same thing.
Production pain points (GraphRAG's value hinges on graph quality):
Finding related entities is a breadth-first search:
public List<Entity> GetNeighbors(string entityName, int depth = 1)
{
var result = new HashSet<Entity>();
var queue = new Queue<(string Name, int Depth)>();
queue.Enqueue((Normalise(entityName), 0));
while (queue.Count > 0)
{
var (name, d) = queue.Dequeue();
if (d >= depth) continue;
// Find all entities connected to this one
var neighbours = _relationships
.Where(r => Normalise(r.Source) == name || Normalise(r.Target) == name)
.SelectMany(r => new[] { r.Source, r.Target });
foreach (var neighbour in neighbours)
if (_entities.TryGetValue(Normalise(neighbour), out var entity))
if (result.Add(entity))
queue.Enqueue((Normalise(neighbour), d + 1));
}
return result.ToList();
}
This is a connected-components baseline, not full Leiden. Leiden optimizes for modularity (dense internal connections, sparse external ones). For a proper implementation, use a graph library or port the algorithm.
public List<Community> DetectCommunities(KnowledgeGraph graph)
{
// Connected components: group everything reachable together
var visited = new HashSet<string>();
var communities = new List<Community>();
foreach (var entity in graph.GetAllEntities())
{
if (visited.Contains(entity.Name)) continue;
// BFS to find all connected entities
var community = new Community();
var queue = new Queue<string>();
queue.Enqueue(entity.Name);
while (queue.Count > 0)
{
var name = queue.Dequeue();
if (!visited.Add(name)) continue;
community.Entities.Add(graph.GetEntity(name));
foreach (var neighbor in graph.GetNeighbors(name, depth: 1))
queue.Enqueue(neighbor.Name);
}
communities.Add(community);
}
return communities;
}
Each community gets a summary describing its theme. This is what powers Global Search:
public async Task<string> SummarizeCommunityAsync(Community community)
{
var entities = string.Join("\n",
community.Entities.Select(e => $"- {e.Name}: {e.Description}"));
var prompt = $"""
Summarize what unites these concepts (2-3 sentences):
{entities}
""";
return await _ollama.GenerateAsync(prompt);
}
This is the recommended approach if you already have working vector search. Keep Qdrant for Local Search, add a lightweight graph layer for Global/DRIFT queries.
First, detect what kind of question this is:
// WARNING: Toy heuristic for illustration only.
// In production, use a classifier prompt or few-shot rules and log misroutes.
private QueryMode ClassifyQuery(string query)
{
var q = query.ToLowerInvariant();
if (q.Contains("main theme") || q.Contains("summarize") || q.Contains("what topics"))
return QueryMode.Global;
if (q.Contains("relate") || q.Contains("connect") || q.Contains("compare"))
return QueryMode.Drift;
return QueryMode.Local;
}
Use existing vector search, optionally enriched with graph context:
private async Task<string> LocalSearchAsync(string query)
{
// Existing semantic search (what we have today)
var chunks = await _semanticSearch.SearchAsync(query, limit: 10);
// NEW: Enrich with related entities from graph
var entities = await _graphService.ExtractEntitiesFromQueryAsync(query);
var related = await _graphService.GetEntityContextAsync(entities);
return await _llm.GenerateAsync(query, FormatContext(chunks, related));
}
Map-reduce over community summaries (no vector search needed):
private async Task<string> GlobalSearchAsync(string query)
{
var summaries = await _graphService.GetAllCommunitySummariesAsync();
// Map: Extract relevant info from each community
var partials = await Task.WhenAll(
summaries.Select(s => _llm.ExtractRelevantInfoAsync(query, s)));
// Reduce: Combine into final answer
return await _llm.SynthesizeAsync(query, partials.Where(p => !string.IsNullOrEmpty(p)));
}
Combine local results with community context for "how does X relate to Y" questions:
private async Task<string> DriftSearchAsync(string query)
{
var localResults = await LocalSearchAsync(query);
var entities = await _graphService.ExtractEntitiesFromQueryAsync(query);
var communities = await _graphService.GetCommunitiesForEntitiesAsync(entities);
var themes = string.Join("\n", communities.Select(c => c.Summary));
return await _llm.GenerateAsync(
$"Question: {query}\n\nDetails:\n{localResults}\n\nBroader themes:\n{themes}",
systemPrompt: "Synthesize the details with the thematic context.");
}
GraphRAG has significant tradeoffs compared to pure vector RAG.
Entity/relationship extraction costs vary significantly by model, prompt design, and chunk size. As a rough order-of-magnitude, assume one or two LLM calls per chunk plus a smaller number of calls for community summaries.
| Operation | Vector RAG | GraphRAG |
|---|---|---|
| Embedding | 1 call/chunk | Same |
| Entity/Relationship Extraction | None | 1-2 LLM calls/chunk |
| Community Summarisation | None | 1 LLM call/community |
For a corpus of 1,000 blog posts with 5 chunks each (5,000 chunks total), vector-only indexing is essentially just embedding costs. GraphRAG adds thousands of LLM calls for extraction and summarisation. The exact cost depends heavily on your model choice and prompt efficiency; using local models (Ollama with llama3.2 or similar) eliminates API costs entirely, which is the recommended approach for experimentation.
| Query Type | Vector RAG | GraphRAG Local | GraphRAG Global |
|---|---|---|---|
| Vector search | 1 call | 1 call | 0 calls |
| Graph traversal | 0 | 1-2 queries | 0 |
| LLM calls | 1 | 1-2 | N (map) + 1 (reduce) |
Global Search is more expensive per query, but it answers questions that Local Search simply can't. You can also cache global answers and refresh them only when the corpus changes.
GraphRAG isn't magic. Watch out for:
Entity normalization is the biggest practical pain. You'll need:
Here's how GraphRAG could enhance the blog's existing semantic search:
User types in search → SemanticSearchService → Qdrant → Results
The classifier routes queries to different search strategies. Here's how a global query flows - note that it never touches the vector store:
sequenceDiagram
participant U as User
participant API as Search API
participant C as Query Classifier
participant G as Global Search
participant KG as Knowledge Graph
U->>API: "What topics does this blog cover?"
API->>C: Classify query
C-->>API: QueryMode.Global
API->>G: GlobalSearch(query)
G->>KG: GetCommunitySummaries()
KG-->>G: [Frontend, Infrastructure, AI/ML]
G->>G: MapReduce over summaries
G-->>API: Synthesized answer
API-->>U: "The blog covers three main areas..."
Compare this to a local query, which combines vector search with graph context for richer answers:
sequenceDiagram
participant U as User
participant API as Search API
participant C as Query Classifier
participant L as Local Search
participant Q as Qdrant
participant KG as Knowledge Graph
U->>API: "How do I use HTMX?"
API->>C: Classify query
C-->>API: QueryMode.Local
API->>L: LocalSearch(query)
L->>Q: Vector search
Q-->>L: Relevant chunks
L->>KG: GetEntityContext("HTMX")
KG-->>L: Related: Alpine.js, Tailwind, ASP.NET
L-->>API: Answer with rich context
API-->>U: "HTMX is used with Alpine.js for..."
The key difference: global queries aggregate community summaries (corpus-level themes), while local queries retrieve specific chunks enriched with entity relationships.
A simpler alternative: GraphRAG is still very much a research tool - entity extraction, graph construction, and community detection add significant complexity and LLM costs. For most use cases, BERT embeddings + BM25 keyword matching works better. This is what Sourcegraph's Cody uses for code intelligence, and what DocSummarizer uses for document summarisation. The pattern: hybrid retrieval handles relevance; the LLM handles assembly, not decision-making. You get 80% of the benefit with 20% of the complexity.
The API is straightforward: classify the query, route to the appropriate handler:
[HttpGet("api/search")]
public async Task<IActionResult> Search([FromQuery] string q, [FromQuery] string mode = "auto")
{
if (mode == "auto")
mode = ClassifyQuery(q);
// global/local return synthesised answers; default returns raw search results
return mode switch
{
"global" => Ok(await _graphRag.GlobalSearchAsync(q)), // synthesised answer
"local" => Ok(await SearchWithGraphContext(q)), // answer with citations
_ => Ok(await _semanticSearch.SearchAsync(q)) // raw ranked results
};
}
Vector search and graph search complement each other. Use vectors for "how do I" questions, graphs for "what are the themes" questions.
GraphRAG extends RAG from "find similar chunks" to "understand the knowledge structure." It's not a replacement for vector search; it's an enhancement that enables new query types.
What GraphRAG adds:
When to use it:
Implementation path:
GraphRAG Official:
RAG Series:
Simpler Alternative (BERT + BM25):
© 2025 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.