This is a viewer only at the moment see the article on how this works.
To update the preview hit Ctrl-Alt-R (or ⌘-Alt-R on Mac) or Enter to refresh. The Save icon lets you save the markdown file to disk
This is a preview from the server running through my markdig pipeline
Tuesday, 25 November 2025
📖 Part of the RAG Series: This is Part 4a - core implementation:
Parts 1-3 explain why semantic search works. This article shows how to build the foundation - a zero-cost, CPU-friendly implementation using ONNX Runtime and Qdrant. Part 4b covers the search UI and hybrid search implementation, and Part 5 covers production auto-indexing.
The Challenge: Most semantic search solutions require expensive GPU infrastructure or costly managed services. What if you're an indie developer running a blog on a modest VPS?
The Solution: A fully functional semantic search system that runs entirely on CPU, using free open-source tools. This is the exact setup running on this blog - zero extra cost beyond existing hosting.
These concepts are covered in depth in the RAG series, but here's what you need to know for this implementation:
Embeddings are vectors (arrays of numbers) that capture the meaning of text. Similar meanings produce similar vectors - that's the magic.
graph TD
A["Text: 'The cat sat on the mat'"] --> B[Embedding Model]
B --> C["Vector: [0.25, -0.18, 0.91, ... 384 more numbers]"]
D["Text: 'A feline rested on the carpet'"] --> B
B --> E["Vector: [0.27, -0.16, 0.89, ... similar numbers!]"]
C -.Similar vectors = similar meaning.-> E
style A stroke:#10b981,stroke-width:2px
style D stroke:#10b981,stroke-width:2px
style B stroke:#6366f1,stroke-width:3px
style C stroke:#f59e0b,stroke-width:2px
style E stroke:#f59e0b,stroke-width:2px
Key insight: Texts with similar meanings will have similar vectors (embeddings). This is how we can find "related" content - we're literally measuring the distance between meanings!
Cosine similarity measures the angle between two vectors - if they point in similar directions, they're semantically similar:
flowchart LR
subgraph "Vector Space (simplified to 2D)"
direction TB
A["'Docker tutorial'"] -.-> B((0.85))
C["'Container deployment'"] -.-> B
D["'Cooking recipes'"] -.-> E((0.12))
A -.-> E
end
B --> F["High Similarity<br/>Related content!"]
E --> G["Low Similarity<br/>Different topics"]
style A stroke:#10b981,stroke-width:2px
style C stroke:#10b981,stroke-width:2px
style D stroke:#f59e0b,stroke-width:2px
style B stroke:#22c55e,stroke-width:3px
style E stroke:#ef4444,stroke-width:3px
style F stroke:#22c55e,stroke-width:2px
style G stroke:#ef4444,stroke-width:2px
The formula: similarity = (A · B) / (||A|| × ||B||) - but since we L2-normalize our vectors, it simplifies to just the dot product!
ONNX (Open Neural Network Exchange) is an open standard format for machine learning models that allows them to run efficiently across different platforms. Think of it like a universal translator for AI models. The ONNX Runtime is Microsoft's high-performance inference engine that executes these models.
Why ONNX for our use case:
flowchart LR
subgraph "ONNX Inference Pipeline"
A[Raw Text] --> B[Tokenizer]
B --> C["Tokens: [CLS] the cat sat [SEP]"]
C --> D[Token IDs: 101 1996 4937 2068 102]
D --> E[ONNX Runtime]
E --> F[384-dim Vector]
F --> G[L2 Normalize]
G --> H[Final Embedding]
end
style A stroke:#10b981,stroke-width:2px
style B stroke:#f59e0b,stroke-width:2px
style C stroke:#f59e0b,stroke-width:2px
style D stroke:#f59e0b,stroke-width:2px
style E stroke:#6366f1,stroke-width:3px
style F stroke:#8b5cf6,stroke-width:2px
style G stroke:#8b5cf6,stroke-width:2px
style H stroke:#ef4444,stroke-width:2px
Qdrant is an open-source vector database - basically a database optimized for storing and searching these embedding vectors. For a deep dive into Qdrant's concepts, configuration, and C# integration, see Self-Hosted Vector Databases with Qdrant. While you could store vectors in PostgreSQL, Qdrant is purpose-built for this and offers:
flowchart TB
subgraph "Qdrant Vector Storage"
direction TB
A[Collection: blog_posts] --> B[Point 1]
A --> C[Point 2]
A --> D[Point N...]
B --> B1["Vector: [0.12, -0.08, ...]"]
B --> B2["Payload: {slug, title, language}"]
C --> C1["Vector: [0.25, 0.14, ...]"]
C --> C2["Payload: {slug, title, language}"]
end
subgraph "Vector Search"
E[Query Vector] --> F[HNSW Index]
F --> G[Cosine Similarity]
G --> H[Top K Results]
end
style A stroke:#ef4444,stroke-width:3px
style B stroke:#8b5cf6,stroke-width:2px
style C stroke:#8b5cf6,stroke-width:2px
style D stroke:#8b5cf6,stroke-width:2px
style B1 stroke:#f59e0b,stroke-width:2px
style B2 stroke:#10b981,stroke-width:2px
style C1 stroke:#f59e0b,stroke-width:2px
style C2 stroke:#10b981,stroke-width:2px
style E stroke:#6366f1,stroke-width:2px
style F stroke:#ec4899,stroke-width:3px
style G stroke:#ec4899,stroke-width:2px
style H stroke:#10b981,stroke-width:2px
Here's how our semantic search system fits together:
flowchart TB
subgraph "Content Ingestion"
A[Blog Post Markdown] --> B[Extract Plain Text]
B --> C[ONNX Embedding Service]
C --> D[Generate 384-dim Vector]
D --> E[Qdrant Vector Store]
end
subgraph "Search Flow"
F[User Query] --> G[ONNX Embedding Service]
G --> H[Generate Query Vector]
H --> I[Qdrant Search]
E -.Vector Similarity.-> I
I --> J[Ranked Results]
end
subgraph "Related Posts"
K[Current Blog Post] --> L[Get Post Vector from Qdrant]
L --> M[Find Similar Vectors]
E -.->M
M --> N[Top 5 Related Posts]
end
style A stroke:#10b981,stroke-width:2px
style B stroke:#10b981,stroke-width:2px
style C stroke:#6366f1,stroke-width:3px
style D stroke:#f59e0b,stroke-width:2px
style E stroke:#ef4444,stroke-width:3px
style F stroke:#10b981,stroke-width:2px
style G stroke:#6366f1,stroke-width:3px
style H stroke:#f59e0b,stroke-width:2px
style I stroke:#ef4444,stroke-width:2px
style J stroke:#8b5cf6,stroke-width:2px
style K stroke:#10b981,stroke-width:2px
style L stroke:#ef4444,stroke-width:2px
style M stroke:#ef4444,stroke-width:2px
style N stroke:#8b5cf6,stroke-width:2px
The flow in plain English:
We've created a clean, modular structure:
Mostlylucid.SemanticSearch/
├── Config/
│ └── SemanticSearchConfig.cs # Configuration settings
├── Models/
│ ├── BlogPostDocument.cs # Document model for indexing
│ └── SearchResult.cs # Search result model
├── Services/
│ ├── IEmbeddingService.cs # Embedding interface
│ ├── OnnxEmbeddingService.cs # ONNX-based embeddings
│ ├── IVectorStoreService.cs # Vector store interface
│ ├── QdrantVectorStoreService.cs # Qdrant implementation
│ ├── ISemanticSearchService.cs # High-level search interface
│ └── SemanticSearchService.cs # Orchestration service
├── Extensions/
│ └── ServiceCollectionExtensions.cs # DI registration
├── download-models.sh # Model download script
└── README.md
First, create the new class library:
dotnet new classlib -n Mostlylucid.SemanticSearch -f net9.0
dotnet sln add Mostlylucid.SemanticSearch
Add the necessary NuGet packages:
cd Mostlylucid.SemanticSearch
dotnet add package Microsoft.Extensions.Logging.Abstractions
dotnet add package Microsoft.ML.OnnxRuntime --version 1.21.1
dotnet add package Qdrant.Client --version 1.14.0
dotnet add reference ../Mostlylucid.Shared/Mostlylucid.Shared.csproj
Let's set up our configuration class. We're using the IConfigSection pattern that's used throughout Mostlylucid:
using Mostlylucid.Shared.Config;
namespace Mostlylucid.SemanticSearch.Config;
/// <summary>
/// Configuration for semantic search functionality
/// </summary>
public class SemanticSearchConfig : IConfigSection
{
public static string Section => "SemanticSearch";
/// <summary>
/// Enable or disable semantic search
/// </summary>
public bool Enabled { get; set; } = true;
/// <summary>
/// Qdrant server URL (e.g., http://localhost:6333)
/// </summary>
public string QdrantUrl { get; set; } = "http://localhost:6333";
/// <summary>
/// Optional read-only API key for Qdrant (used for search operations)
/// </summary>
public string? ReadApiKey { get; set; }
/// <summary>
/// Optional read-write API key for Qdrant (used for indexing operations)
/// </summary>
public string? WriteApiKey { get; set; }
/// <summary>
/// Collection name in Qdrant for blog posts
/// </summary>
public string CollectionName { get; set; } = "blog_posts";
/// <summary>
/// Path to the ONNX embedding model file
/// </summary>
public string EmbeddingModelPath { get; set; } = "models/all-MiniLM-L6-v2.onnx";
/// <summary>
/// Path to the tokenizer vocabulary file
/// </summary>
public string VocabPath { get; set; } = "models/vocab.txt";
/// <summary>
/// Embedding vector size (384 for all-MiniLM-L6-v2)
/// </summary>
public int VectorSize { get; set; } = 384;
/// <summary>
/// Number of related posts to return
/// </summary>
public int RelatedPostsCount { get; set; } = 5;
/// <summary>
/// Minimum similarity score (0-1) for related posts
/// </summary>
public float MinimumSimilarityScore { get; set; } = 0.5f;
/// <summary>
/// Number of search results to return
/// </summary>
public int SearchResultsCount { get; set; } = 10;
}
Why separate API keys? Security! Your read key can be used in public-facing search endpoints, while your write key stays server-side for admin operations only.
Add this to your appsettings.json:
{
"SemanticSearch": {
"Enabled": false,
"QdrantUrl": "http://localhost:6333",
"ReadApiKey": "",
"WriteApiKey": "",
"CollectionName": "blog_posts",
"EmbeddingModelPath": "models/all-MiniLM-L6-v2.onnx",
"VocabPath": "models/vocab.txt",
"VectorSize": 384,
"RelatedPostsCount": 5,
"MinimumSimilarityScore": 0.5,
"SearchResultsCount": 10
}
}
This is where the magic happens. We're using the all-MiniLM-L6-v2 model, which is specifically designed for semantic similarity tasks and runs efficiently on CPU.
Why this model?
Here's the complete implementation:
using Microsoft.Extensions.Logging;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using Mostlylucid.SemanticSearch.Config;
using System.Text.RegularExpressions;
namespace Mostlylucid.SemanticSearch.Services;
public class OnnxEmbeddingService : IEmbeddingService, IDisposable
{
private readonly ILogger<OnnxEmbeddingService> _logger;
private readonly SemanticSearchConfig _config;
private readonly InferenceSession? _session;
private readonly Dictionary<string, int> _vocabulary;
private readonly SemaphoreSlim _semaphore = new(1, 1);
private bool _disposed;
private const int MaxSequenceLength = 256;
private const string PadToken = "[PAD]";
private const string UnkToken = "[UNK]";
private const string ClsToken = "[CLS]";
private const string SepToken = "[SEP]";
public OnnxEmbeddingService(
ILogger<OnnxEmbeddingService> logger,
SemanticSearchConfig config)
{
_logger = logger;
_config = config;
_vocabulary = new Dictionary<string, int>();
if (!_config.Enabled)
{
_logger.LogInformation("Semantic search is disabled");
return;
}
try
{
// Check if model file exists
if (!File.Exists(_config.EmbeddingModelPath))
{
_logger.LogWarning("Embedding model not found at {Path}. Semantic search will be disabled.",
_config.EmbeddingModelPath);
return;
}
// Load vocabulary if it exists
if (File.Exists(_config.VocabPath))
{
LoadVocabulary(_config.VocabPath);
}
// Create ONNX session with CPU execution provider
var sessionOptions = new SessionOptions
{
ExecutionMode = ExecutionMode.ORT_SEQUENTIAL,
GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL
};
_session = new InferenceSession(_config.EmbeddingModelPath, sessionOptions);
_logger.LogInformation("ONNX embedding model loaded successfully from {Path}",
_config.EmbeddingModelPath);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to initialize ONNX embedding service");
}
}
private void LoadVocabulary(string vocabPath)
{
var lines = File.ReadAllLines(vocabPath);
for (int i = 0; i < lines.Length; i++)
{
var token = lines[i].Trim();
if (!string.IsNullOrEmpty(token))
{
_vocabulary[token] = i;
}
}
_logger.LogInformation("Loaded vocabulary with {Count} tokens", _vocabulary.Count);
}
public async Task<float[]> GenerateEmbeddingAsync(string text, CancellationToken cancellationToken = default)
{
if (_session == null || !_config.Enabled)
{
return new float[_config.VectorSize];
}
if (string.IsNullOrWhiteSpace(text))
{
return new float[_config.VectorSize];
}
// Use semaphore to prevent concurrent ONNX inference (not thread-safe)
await _semaphore.WaitAsync(cancellationToken);
try
{
return await Task.Run(() => GenerateEmbedding(text), cancellationToken);
}
finally
{
_semaphore.Release();
}
}
private float[] GenerateEmbedding(string text)
{
try
{
// Tokenize the input text
var tokens = Tokenize(text);
// Create input tensors for ONNX model
var inputIds = CreateInputTensor(tokens, "input_ids");
var attentionMask = CreateAttentionMaskTensor(tokens.Length);
var tokenTypeIds = CreateTokenTypeIdsTensor(tokens.Length);
// Run inference
var inputs = new List<NamedOnnxValue>
{
NamedOnnxValue.CreateFromTensor("input_ids", inputIds),
NamedOnnxValue.CreateFromTensor("attention_mask", attentionMask),
NamedOnnxValue.CreateFromTensor("token_type_ids", tokenTypeIds)
};
using var results = _session!.Run(inputs);
// Extract the output tensor (sentence embedding)
var output = results.First().AsTensor<float>();
var embedding = output.ToArray();
// Normalize the vector (L2 normalization)
return NormalizeVector(embedding);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error generating embedding for text: {Text}",
text[..Math.Min(100, text.Length)]);
return new float[_config.VectorSize];
}
}
private List<int> Tokenize(string text)
{
// Simple whitespace + punctuation tokenization
var tokens = new List<int>();
// Add [CLS] token at the start
if (_vocabulary.TryGetValue(ClsToken, out var clsId))
tokens.Add(clsId);
// Tokenize the text
var words = Regex.Split(text.ToLowerInvariant(), @"(\W+)")
.Where(w => !string.IsNullOrWhiteSpace(w))
.Take(MaxSequenceLength - 2); // Leave room for [CLS] and [SEP]
foreach (var word in words)
{
if (_vocabulary.Count > 0)
{
if (_vocabulary.TryGetValue(word, out var tokenId))
tokens.Add(tokenId);
else if (_vocabulary.TryGetValue(UnkToken, out var unkId))
tokens.Add(unkId);
}
else
{
// Fallback: use hash code as token ID
tokens.Add(Math.Abs(word.GetHashCode()) % 30000);
}
}
// Add [SEP] token at the end
if (_vocabulary.TryGetValue(SepToken, out var sepId))
tokens.Add(sepId);
return tokens;
}
private Tensor<long> CreateInputTensor(List<int> tokens, string name)
{
var length = Math.Min(tokens.Count, MaxSequenceLength);
var tensorData = new long[1, MaxSequenceLength];
for (int i = 0; i < length; i++)
{
tensorData[0, i] = tokens[i];
}
// Pad the rest
var padId = _vocabulary.TryGetValue(PadToken, out var id) ? id : 0;
for (int i = length; i < MaxSequenceLength; i++)
{
tensorData[0, i] = padId;
}
return new DenseTensor<long>(tensorData, new[] { 1, MaxSequenceLength });
}
private Tensor<long> CreateAttentionMaskTensor(int actualLength)
{
var length = Math.Min(actualLength, MaxSequenceLength);
var tensorData = new long[1, MaxSequenceLength];
for (int i = 0; i < length; i++)
{
tensorData[0, i] = 1; // Attend to actual tokens
}
return new DenseTensor<long>(tensorData, new[] { 1, MaxSequenceLength });
}
private Tensor<long> CreateTokenTypeIdsTensor(int actualLength)
{
var tensorData = new long[1, MaxSequenceLength];
// All zeros for single sentence
return new DenseTensor<long>(tensorData, new[] { 1, MaxSequenceLength });
}
private float[] NormalizeVector(float[] vector)
{
// L2 normalization
var sumOfSquares = vector.Sum(v => v * v);
var magnitude = MathF.Sqrt(sumOfSquares);
if (magnitude > 0)
{
for (int i = 0; i < vector.Length; i++)
{
vector[i] /= magnitude;
}
}
return vector;
}
public void Dispose()
{
if (_disposed) return;
_session?.Dispose();
_semaphore?.Dispose();
_disposed = true;
GC.SuppressFinalize(this);
}
}
Key points for junior devs:
Now let's implement the vector storage and search:
using Microsoft.Extensions.Logging;
using Mostlylucid.SemanticSearch.Config;
using Mostlylucid.SemanticSearch.Models;
using Qdrant.Client;
using Qdrant.Client.Grpc;
namespace Mostlylucid.SemanticSearch.Services;
public class QdrantVectorStoreService : IVectorStoreService
{
private readonly ILogger<QdrantVectorStoreService> _logger;
private readonly SemanticSearchConfig _config;
private readonly QdrantClient? _client;
private bool _collectionInitialized;
public QdrantVectorStoreService(
ILogger<QdrantVectorStoreService> logger,
SemanticSearchConfig config)
{
_logger = logger;
_config = config;
if (!_config.Enabled)
{
_logger.LogInformation("Semantic search is disabled");
return;
}
try
{
var uri = new Uri(_config.QdrantUrl);
var host = uri.Host;
var port = uri.Port > 0 ? uri.Port : 6334; // Default gRPC port
_client = new QdrantClient(host, port, https: uri.Scheme == "https");
_logger.LogInformation("Connected to Qdrant at {Host}:{Port}", host, port);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to connect to Qdrant at {Url}", _config.QdrantUrl);
}
}
public async Task InitializeCollectionAsync(CancellationToken cancellationToken = default)
{
if (_client == null || !_config.Enabled || _collectionInitialized)
return;
try
{
var collections = await _client.ListCollectionsAsync(cancellationToken);
var collectionExists = collections.Any(c => c.Name == _config.CollectionName);
if (!collectionExists)
{
_logger.LogInformation("Creating collection {CollectionName}", _config.CollectionName);
await _client.CreateCollectionAsync(
collectionName: _config.CollectionName,
vectorsConfig: new VectorParams
{
Size = (ulong)_config.VectorSize,
Distance = Distance.Cosine // Cosine similarity for semantic search
},
cancellationToken: cancellationToken
);
_logger.LogInformation("Collection {CollectionName} created successfully", _config.CollectionName);
}
_collectionInitialized = true;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to initialize collection {CollectionName}", _config.CollectionName);
throw;
}
}
public async Task<List<SearchResult>> FindRelatedPostsAsync(
string slug,
string language,
int limit = 5,
CancellationToken cancellationToken = default)
{
if (_client == null || !_config.Enabled)
return new List<SearchResult>();
try
{
// Find the document by slug and language
var scrollResults = await _client.ScrollAsync(
collectionName: _config.CollectionName,
filter: new Filter
{
Must =
{
new Condition
{
Field = new FieldCondition
{
Key = "slug",
Match = new Match { Keyword = slug }
}
},
new Condition
{
Field = new FieldCondition
{
Key = "language",
Match = new Match { Keyword = language }
}
}
}
},
limit: 1,
cancellationToken: cancellationToken
);
var point = scrollResults.FirstOrDefault();
if (point == null)
{
_logger.LogWarning("Post {Slug} ({Language}) not found in vector store", slug, language);
return new List<SearchResult>();
}
// Use the document's vector to find similar posts
var searchResults = await _client.SearchAsync(
collectionName: _config.CollectionName,
vector: point.Vectors.Vector.Data.ToArray(),
limit: (ulong)(limit + 1), // +1 because the first result will be the post itself
scoreThreshold: _config.MinimumSimilarityScore,
cancellationToken: cancellationToken
);
// Filter out the original post and return top N similar posts
return searchResults
.Where(r => r.Payload["slug"].StringValue != slug || r.Payload["language"].StringValue != language)
.Take(limit)
.Select(result => new SearchResult
{
Slug = result.Payload["slug"].StringValue,
Title = result.Payload["title"].StringValue,
Language = result.Payload["language"].StringValue,
Categories = result.Payload.TryGetValue("categories", out var cats)
? cats.ListValue.Values.Select(v => v.StringValue).ToList()
: new List<string>(),
Score = result.Score,
PublishedDate = DateTime.Parse(result.Payload["published_date"].StringValue)
})
.ToList();
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to find related posts for {Slug} ({Language})", slug, language);
return new List<SearchResult>();
}
}
// ... Additional methods for IndexDocument, Search, Delete, etc.
}
What's happening here:
This high-level service ties everything together:
using Microsoft.Extensions.Logging;
using Mostlylucid.SemanticSearch.Config;
using Mostlylucid.SemanticSearch.Models;
using System.Security.Cryptography;
using System.Text;
namespace Mostlylucid.SemanticSearch.Services;
public class SemanticSearchService : ISemanticSearchService
{
private readonly ILogger<SemanticSearchService> _logger;
private readonly SemanticSearchConfig _config;
private readonly IEmbeddingService _embeddingService;
private readonly IVectorStoreService _vectorStoreService;
public SemanticSearchService(
ILogger<SemanticSearchService> logger,
SemanticSearchConfig config,
IEmbeddingService embeddingService,
IVectorStoreService vectorStoreService)
{
_logger = logger;
_config = config;
_embeddingService = embeddingService;
_vectorStoreService = vectorStoreService;
}
public async Task IndexPostAsync(BlogPostDocument document, CancellationToken cancellationToken = default)
{
if (!_config.Enabled)
return;
try
{
// Prepare text for embedding: combine title and content
// We give more weight to the title by including it twice
var textToEmbed = $"{document.Title}. {document.Title}. {document.Content}";
// Truncate to reasonable length (embedding models have token limits)
const int maxLength = 2000;
if (textToEmbed.Length > maxLength)
{
textToEmbed = textToEmbed[..maxLength];
}
// Generate embedding
var embedding = await _embeddingService.GenerateEmbeddingAsync(textToEmbed, cancellationToken);
// Compute content hash if not provided
if (string.IsNullOrEmpty(document.ContentHash))
{
document.ContentHash = ComputeContentHash(document.Content);
}
// Store in vector database
await _vectorStoreService.IndexDocumentAsync(document, embedding, cancellationToken);
_logger.LogInformation("Indexed post {Slug} ({Language})", document.Slug, document.Language);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to index post {Slug} ({Language})", document.Slug, document.Language);
}
}
public async Task<List<SearchResult>> SearchAsync(
string query,
int limit = 10,
CancellationToken cancellationToken = default)
{
if (!_config.Enabled || string.IsNullOrWhiteSpace(query))
return new List<SearchResult>();
try
{
// Generate embedding for the search query
var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query, cancellationToken);
// Search in vector store
var results = await _vectorStoreService.SearchAsync(
queryEmbedding,
Math.Min(limit, _config.SearchResultsCount),
_config.MinimumSimilarityScore,
cancellationToken);
_logger.LogDebug("Search for '{Query}' returned {Count} results", query, results.Count);
return results;
}
catch (Exception ex)
{
_logger.LogError(ex, "Search failed for query '{Query}'", query);
return new List<SearchResult>();
}
}
public async Task<List<SearchResult>> GetRelatedPostsAsync(
string slug,
string language,
int limit = 5,
CancellationToken cancellationToken = default)
{
if (!_config.Enabled)
return new List<SearchResult>();
try
{
var results = await _vectorStoreService.FindRelatedPostsAsync(
slug,
language,
Math.Min(limit, _config.RelatedPostsCount),
cancellationToken);
_logger.LogDebug("Found {Count} related posts for {Slug} ({Language})",
results.Count, slug, language);
return results;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to get related posts for {Slug} ({Language})", slug, language);
return new List<SearchResult>();
}
}
private string ComputeContentHash(string content)
{
using var sha256 = SHA256.Create();
var bytes = Encoding.UTF8.GetBytes(content);
var hashBytes = sha256.ComputeHash(bytes);
return Convert.ToBase64String(hashBytes);
}
}
Register everything in the DI container:
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Mostlylucid.SemanticSearch.Config;
using Mostlylucid.SemanticSearch.Services;
using Mostlylucid.Shared.Config;
namespace Mostlylucid.SemanticSearch.Extensions;
public static class ServiceCollectionExtensions
{
public static void AddSemanticSearch(
this IServiceCollection services,
IConfiguration configuration)
{
// Bind configuration using POCO pattern
services.ConfigurePOCO<SemanticSearchConfig>(
configuration.GetSection(SemanticSearchConfig.Section));
// Register services as singletons for efficiency
services.AddSingleton<IEmbeddingService, OnnxEmbeddingService>();
services.AddSingleton<IVectorStoreService, QdrantVectorStoreService>();
services.AddSingleton<ISemanticSearchService, SemanticSearchService>();
}
}
In your Program.cs:
using Mostlylucid.SemanticSearch.Extensions;
using Mostlylucid.SemanticSearch.Services;
// Add services
services.AddSemanticSearch(config);
// Initialize after building the app
using (var scope = app.Services.CreateScope())
{
var semanticSearch = scope.ServiceProvider.GetRequiredService<ISemanticSearchService>();
await semanticSearch.InitializeAsync();
}
Create a separate docker-compose file for semantic search services:
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:latest
container_name: mostlylucid-qdrant
restart: unless-stopped
ports:
- "6333:6333" # HTTP API
- "6334:6334" # gRPC API
volumes:
- qdrant_storage:/qdrant/storage
environment:
- QDRANT__SERVICE__HTTP_PORT=6333
- QDRANT__SERVICE__GRPC_PORT=6334
networks:
- mostlylucid_network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:6333/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
volumes:
qdrant_storage:
driver: local
networks:
mostlylucid_network:
name: mostlylucidweb_app_network
external: true
Start it with:
docker-compose -f semantic-search-docker-compose.yml up -d
We're using the all-MiniLM-L6-v2 model from Hugging Face's Sentence Transformers library. This model is specifically trained on semantic similarity tasks and produces 384-dimensional embeddings.
The service automatically downloads the model from Hugging Face on first run if it doesn't exist:
// In OnnxEmbeddingService.cs
private const string ModelUrl = "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx";
private const string VocabUrl = "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/vocab.txt";
public async Task EnsureInitializedAsync(CancellationToken cancellationToken = default)
{
if (_initialized || !_config.Enabled) return;
// Download model if not exists
if (!File.Exists(_config.EmbeddingModelPath))
{
_logger.LogInformation("Downloading ONNX embedding model to {Path}...", _config.EmbeddingModelPath);
await DownloadFileAsync(ModelUrl, _config.EmbeddingModelPath, cancellationToken);
}
// Download vocab if not exists
if (!File.Exists(_config.VocabPath))
{
_logger.LogInformation("Downloading vocabulary file to {Path}...", _config.VocabPath);
await DownloadFileAsync(VocabUrl, _config.VocabPath, cancellationToken);
}
// Initialize ONNX session...
}
This is particularly useful when deploying with Docker - you can map a volume for the models directory:
volumes:
- ./mlmodels:/app/mlmodels # Model persists across container restarts
Alternatively, you can download manually:
chmod +x Mostlylucid.SemanticSearch/download-models.sh
./Mostlylucid.SemanticSearch/download-models.sh
Or directly from Hugging Face:
mkdir -p mlmodels
curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx -o mlmodels/all-MiniLM-L6-v2.onnx
curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/vocab.txt -o mlmodels/vocab.txt
This downloads:
all-MiniLM-L6-v2.onnx (~90MB) - The ONNX-exported embedding modelvocab.txt (~230KB) - The WordPiece tokenizer vocabularyWe use ASP.NET Core output caching:
[OutputCache(Duration = 7200, VaryByRouteValueNames = new[] {"slug", "language"})]
This caches related posts for 2 hours, significantly reducing load.
At this point you have a complete, working semantic search foundation:
This is the exact setup running on this blog - zero GPU, zero additional cost.
In Part 4b: Semantic Search in Action, we cover:
Continue to Part 4b for the search UI and hybrid search implementation.
Then Part 5: Hybrid Search & Auto-Indexing covers production integration patterns.
All code available at: github.com/scottgal/mostlylucidweb
Mostlylucid.SemanticSearch/ - Core semantic search library© 2025 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.