Building a "Lawyer GPT" for Your Blog - Part 3: Understanding Embeddings & Vector Databases (English)

Introduction

Welcome to Part 3! We've got our GPU stack working (Part 2), and we understand the architecture (Part 1). Now it's time to dive into the magic that makes semantic search possible: embeddings and vector databases.

NOTE: This is part of my experiments with AI (assisted drafting) + my own editing. Same voice, same pragmatism; just faster fingers.

This is where things get interesting. We're going to understand how to represent text as numbers in a way that captures meaning, not just keywords. It's the difference between finding posts that contain the word "docker" vs finding posts that are semantically about containerization concepts.

What Are Embeddings?

An embedding is a numerical representation of text (or images, audio, etc.) as a vector - essentially a list of numbers.

The Intuition

Imagine plotting words in a 2D space based on their meaning:

graph TD
    subgraph "2D Embedding Space (simplified)"
        A[cat: 0.8, 0.2]
        B[kitten: 0.7, 0.3]
        C[dog: 0.6, 0.1]
        D[puppy: 0.5, 0.2]
        E[car: -0.5, 0.8]
        F[vehicle: -0.6, 0.7]
        G[database: 0.1, -0.7]
        H[SQL: 0.2, -0.8]
    end

    class A,B cats
    class C,D dogs
    class E,F vehicles
    class G,H tech

    classDef cats stroke:#333
    classDef dogs stroke:#333
    classDef vehicles stroke:#333
    classDef tech stroke:#333

Similar concepts cluster together:

Pets (red/green) are near each other
Vehicles (blue) are grouped
Tech terms (yellow) cluster
Unrelated concepts are far apart

Real embeddings use 384 to 1536 dimensions (not just 2!), capturing much more nuance.

How It Works

An embedding model is a neural network trained to map text to vectors such that:

Similar meanings → close vectors
Different meanings → distant vectors

graph LR
    A[Text: 'Docker container'] --> B[Embedding Model]
    B --> C[Vector: 384 floats]

    D[Text: 'containerization'] --> B
    B --> E[Vector: 384 floats]

    C -.Similar.-> E

    F[Text: 'chocolate cake'] --> B
    B --> G[Vector: 384 floats]

    C -.Very Different.-> G

    class B model
    class C,E similar
    class G different

    classDef model stroke:#333,stroke-width:4px
    classDef similar stroke:#333
    classDef different stroke:#333

Measuring Similarity

We use cosine similarity to measure how close two vectors are:

public static float CosineSimilarity(float[] vectorA, float[] vectorB)
{
    if (vectorA.Length != vectorB.Length)
        throw new ArgumentException("Vectors must have same length");

    // Dot product: sum of element-wise multiplication
    float dotProduct = 0;
    for (int i = 0; i < vectorA.Length; i++)
    {
        dotProduct += vectorA[i] * vectorB[i];
    }

    // Magnitude of each vector: sqrt(sum of squares)
    float magnitudeA = 0;
    float magnitudeB = 0;
    for (int i = 0; i < vectorA.Length; i++)
    {
        magnitudeA += vectorA[i] * vectorA[i];
        magnitudeB += vectorB[i] * vectorB[i];
    }
    magnitudeA = MathF.Sqrt(magnitudeA);
    magnitudeB = MathF.Sqrt(magnitudeB);

    // Cosine similarity: dot product / (magnitude_a * magnitude_b)
    return dotProduct / (magnitudeA * magnitudeB);
}

Cosine similarity ranges from -1 to 1:

1.0 = identical meaning
0.0 = unrelated
-1.0 = opposite meaning (rare in practice)

Example:

var dockerEmbed = new float[] { 0.5f, 0.3f, -0.2f, 0.8f };  // "Docker container"
var containerEmbed = new float[] { 0.45f, 0.35f, -0.18f, 0.75f };  // "containerization"
var cakeEmbed = new float[] { -0.7f, 0.1f, 0.9f, -0.3f };  // "chocolate cake"

Console.WriteLine(CosineSimilarity(dockerEmbed, containerEmbed));  // ~0.95 (very similar!)
Console.WriteLine(CosineSimilarity(dockerEmbed, cakeEmbed));  // ~0.15 (unrelated)

Why Embeddings Are Perfect for Our Use Case

Remember, we're building a writing assistant. When I start writing:

"In this post, I'll show how to use Entity Framework with..."

The system should find past posts about:

Entity Framework
Database access patterns
ORM configurations
Related ASP.NET Core topics

Not just keyword matches! It should understand that "EF Core migrations" is related even though it doesn't say "Entity Framework".

Embeddings vs Traditional Search

graph TB
    subgraph "Traditional Keyword Search"
        A1[Query: 'Docker setup'] --> B1[Find: 'Docker' OR 'setup']
        B1 --> C1[❌ Misses: 'containerization guide']
        B1 --> D1[❌ Misses: 'running containers']
        B1 --> E1[✅ Finds: 'Docker setup tutorial']
    end

    subgraph "Embedding-Based Semantic Search"
        A2[Query: 'Docker setup'] --> B2[Generate embedding]
        B2 --> C2[Find similar embeddings]
        C2 --> D2[✅ Finds: 'containerization guide']
        C2 --> E2[✅ Finds: 'running containers']
        C2 --> F2[✅ Finds: 'Docker setup tutorial']
    end

    class B2,C2 semantic

    classDef semantic stroke:#333,stroke-width:2px

Choosing an Embedding Model

There are many embedding models. We need one that:

Runs locally (privacy + speed)
Works with ONNX Runtime (so we can use our GPU)
Good quality (accurate semantic understanding)
Right size (384-768 dimensions is good balance)

Popular Options

Model	Dimensions	Size	Quality	Speed
all-MiniLM-L6-v2	384	80MB	Good	Very Fast ⚡⚡⚡
all-mpnet-base-v2	768	420MB	Better	Fast ⚡⚡
bge-small-en-v1.5	384	133MB	Better	Very Fast ⚡⚡⚡
bge-base-en-v1.5	768	436MB	Best	Fast ⚡⚡
OpenAI text-embedding-ada-002	1536	N/A (API)	Excellent	Slow (network) ⚡

My recommendation: bge-base-en-v1.5

State-of-the-art open source model
768 dimensions (good balance)
Works great with ONNX Runtime
Free and runs locally

Converting to ONNX Format

Most models are in PyTorch format. We need ONNX for C#.

Install Optimum (Python library for conversion):

pip install optimum[exporters]

Convert BGE model:

optimum-cli export onnx --model BAAI/bge-base-en-v1.5 --task feature-extraction bge-base-en-onnx/

This creates:

bge-base-en-onnx/
    model.onnx           # The neural network
    tokenizer.json       # Text → tokens converter
    tokenizer_config.json
    special_tokens_map.json
    config.json

Download if you don't want to convert: Many models are pre-converted and available on Hugging Face with "onnx" in the name.

Using Embeddings in C#

Let's build a practical embedding generator using ONNX Runtime.

Project Setup

mkdir EmbeddingTest
cd EmbeddingTest
dotnet new console
dotnet add package Microsoft.ML.OnnxRuntime.Gpu --version 1.16.3
dotnet add package Microsoft.ML.Tokenizers --version 0.1.0-preview.23511.1

Why these packages?

OnnxRuntime.Gpu - Run the model on GPU
Microsoft.ML.Tokenizers - Convert text to token IDs (model input format)

Tokenization First

Before embedding, we must tokenize (convert text to numbers):

using Microsoft.ML.Tokenizers;
using System;
using System.Linq;

public class SimpleTokenizer
{
    private readonly Tokenizer _tokenizer;

    public SimpleTokenizer(string tokenizerPath)
    {
        // Load the tokenizer.json file
        _tokenizer = Tokenizer.CreateTokenizer(tokenizerPath);
    }

    public (long[] InputIds, long[] AttentionMask) Tokenize(string text, int maxLength = 512)
    {
        // Tokenize the text
        var encoding = _tokenizer.Encode(text);

        // Get token IDs
        var ids = encoding.Ids.Select(i => (long)i).ToArray();

        // Pad or truncate to maxLength
        var inputIds = new long[maxLength];
        var attentionMask = new long[maxLength];

        int length = Math.Min(ids.Length, maxLength);

        // Copy actual tokens
        Array.Copy(ids, inputIds, length);

        // Set attention mask (1 = real token, 0 = padding)
        for (int i = 0; i < length; i++)
        {
            attentionMask[i] = 1;
        }

        return (inputIds, attentionMask);
    }
}

What's happening here?

Encoding - "Hello world" → [101, 8667, 2088, 102]
- Each number is a token ID from the model's vocabulary
- Special tokens: 101 = [CLS], 102 = [SEP]
Padding - Models expect fixed-length input
- If text is short: pad with zeros
- If text is long: truncate
Attention mask - Tells the model which tokens are real vs padding
- 1 = process this token
- 0 = ignore (it's padding)

graph LR
    A["Text: 'Docker setup'"] --> B[Tokenizer]
    B --> C[Token IDs:<br/>101, 12849, 12229, 102]
    C --> D[Pad to 512]
    D --> E[Input IDs:<br/>101, 12849, 12229, 102, 0, 0,...]
    D --> F[Attention Mask:<br/>1, 1, 1, 1, 0, 0,...]

    E --> G[Feed to Model]
    F --> G

    class B,G process

    classDef process stroke:#333,stroke-width:2px

Embedding Generator Class

Now the full embedding pipeline:

using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using System;
using System.Collections.Generic;
using System.Linq;

public class EmbeddingGenerator : IDisposable
{
    private readonly InferenceSession _session;
    private readonly SimpleTokenizer _tokenizer;
    private readonly int _embeddingDimension;

    public EmbeddingGenerator(string modelPath, string tokenizerPath, bool useGpu = true)
    {
        // Setup session options
        var options = new SessionOptions();
        if (useGpu)
        {
            options.AppendExecutionProvider_CUDA(0);
        }

        // Load model
        _session = new InferenceSession(modelPath, options);

        // Load tokenizer
        _tokenizer = new SimpleTokenizer(tokenizerPath);

        // Get embedding dimension from model output shape
        var outputMetadata = _session.OutputMetadata["last_hidden_state"];
        _embeddingDimension = outputMetadata.Dimensions[2]; // Usually 768 for base models
    }

    public float[] GenerateEmbedding(string text)
    {
        // Step 1: Tokenize
        var (inputIds, attentionMask) = _tokenizer.Tokenize(text);

        // Step 2: Create input tensors
        var inputIdsTensor = new DenseTensor<long>(inputIds, new[] { 1, inputIds.Length });
        var attentionMaskTensor = new DenseTensor<long>(attentionMask, new[] { 1, attentionMask.Length });

        var inputs = new List<NamedOnnxValue>
        {
            NamedOnnxValue.CreateFromTensor("input_ids", inputIdsTensor),
            NamedOnnxValue.CreateFromTensor("attention_mask", attentionMaskTensor)
        };

        // Step 3: Run inference
        using var results = _session.Run(inputs);

        // Step 4: Extract embeddings from output
        var outputTensor = results.First().AsTensor<float>();

        // Output shape is [batch_size, sequence_length, embedding_dim]
        // We want [batch_size, embedding_dim] by mean pooling

        return MeanPooling(outputTensor, attentionMask);
    }

    private float[] MeanPooling(Tensor<float> outputTensor, long[] attentionMask)
    {
        int seqLength = outputTensor.Dimensions[1];
        int embeddingDim = outputTensor.Dimensions[2];

        var embedding = new float[embeddingDim];
        int tokenCount = 0;

        // Average across all non-padded tokens
        for (int seq = 0; seq < seqLength; seq++)
        {
            if (attentionMask[seq] == 0) continue; // Skip padding

            tokenCount++;
            for (int dim = 0; dim < embeddingDim; dim++)
            {
                embedding[dim] += outputTensor[0, seq, dim];
            }
        }

        // Divide by count to get mean
        for (int dim = 0; dim < embeddingDim; dim++)
        {
            embedding[dim] /= tokenCount;
        }

        // Normalize to unit length (common practice)
        return Normalize(embedding);
    }

    private float[] Normalize(float[] vector)
    {
        float magnitude = 0;
        foreach (var val in vector)
        {
            magnitude += val * val;
        }
        magnitude = MathF.Sqrt(magnitude);

        var normalized = new float[vector.Length];
        for (int i = 0; i < vector.Length; i++)
        {
            normalized[i] = vector[i] / magnitude;
        }

        return normalized;
    }

    public void Dispose()
    {
        _session?.Dispose();
    }
}

Code breakdown:

Model loading - Creates ONNX session with GPU support
Tokenization - Converts text to token IDs
Tensor creation - Shapes data for model input
Inference - Runs the neural network
Mean pooling - Averages token embeddings → single sentence embedding
Normalization - Makes vector length 1 (simplifies similarity calculation)

Mean Pooling Explained

graph TB
    A[Model Output:<br/>Token Embeddings] --> B["Token 0 (CLS):<br/>(0.1, 0.5, -0.3, ...)"]
    A --> C["Token 1 (Docker):<br/>(0.4, 0.2, -0.1, ...)"]
    A --> D["Token 2 (setup):<br/>(0.3, 0.6, -0.2, ...)"]
    A --> E["Token 3 (SEP):<br/>(0.2, 0.3, -0.4, ...)"]

    B --> F[Average]
    C --> F
    D --> F
    E --> F

    F --> G["Sentence Embedding:<br/>(0.25, 0.4, -0.25, ...)"]

    class A input
    class F process
    class G output

    classDef input stroke:#333
    classDef process stroke:#333,stroke-width:2px
    classDef output stroke:#333,stroke-width:2px

We average because:

Each token has its own embedding
We need ONE embedding for the whole sentence
Averaging captures the overall meaning

Usage Example

using System;

class Program
{
    static void Main(string[] args)
    {
        using var embedder = new EmbeddingGenerator(
            modelPath: "bge-base-en-onnx/model.onnx",
            tokenizerPath: "bge-base-en-onnx/tokenizer.json",
            useGpu: true
        );

        // Generate embeddings
        var embedding1 = embedder.GenerateEmbedding("Docker containerization tutorial");
        var embedding2 = embedder.GenerateEmbedding("Setting up containers with Docker");
        var embedding3 = embedder.GenerateEmbedding("Baking a chocolate cake");

        Console.WriteLine($"Embedding dimension: {embedding1.Length}");
        Console.WriteLine($"First 5 values: {string.Join(", ", embedding1.Take(5).Select(f => f.ToString("F4")))}");

        // Calculate similarities
        float sim12 = CosineSimilarity(embedding1, embedding2);
        float sim13 = CosineSimilarity(embedding1, embedding3);

        Console.WriteLine($"\nSimilarity (Docker vs Containers): {sim12:F4}");  // ~0.85
        Console.WriteLine($"Similarity (Docker vs Cake): {sim13:F4}");  // ~0.10
    }

    static float CosineSimilarity(float[] a, float[] b)
    {
        // Since vectors are normalized, dot product = cosine similarity
        float dot = 0;
        for (int i = 0; i < a.Length; i++)
        {
            dot += a[i] * b[i];
        }
        return dot;
    }
}

Output:

Embedding dimension: 768
First 5 values: 0.0123, -0.0456, 0.0789, -0.0234, 0.0567

Similarity (Docker vs Containers): 0.8542
Similarity (Docker vs Cake): 0.1023

Beautiful! Semantically similar content has high similarity, unrelated content is low.

Vector Databases

Now we have embeddings. We need to store millions of them and search efficiently.

The Problem

Let's say we have 1000 blog posts, each chunked into 10 pieces = 10,000 embeddings.

Naive search:

float bestSimilarity = -1;
int bestIndex = -1;

for (int i = 0; i < 10000; i++)
{
    float sim = CosineSimilarity(queryEmbedding, storedEmbeddings[i]);
    if (sim > bestSimilarity)
    {
        bestSimilarity = sim;
        bestIndex = i;
    }
}

Problem: This is O(n) - we check EVERY embedding. Slow!

For 10,000 embeddings × 768 dimensions each:

~30 million floating point operations
~50-100ms even on fast CPU

We need something faster.

Vector Database Solution

Vector databases use clever data structures (like HNSW - Hierarchical Navigable Small World graphs) to find nearest neighbors in O(log n) time.

graph TB
    A[Query Embedding] --> B[Vector Database]
    B --> C{HNSW Index}

    C --> D[Layer 2:<br/>Coarse Search]
    D --> E[Layer 1:<br/>Refined Search]
    E --> F[Layer 0:<br/>Exact Search]

    F --> G[Top K Results]

    H[10,000 vectors] -.Indexed.-> C

    class B db
    class C index
    class G results

    classDef db stroke:#333,stroke-width:4px
    classDef index stroke:#333,stroke-width:2px
    classDef results stroke:#333,stroke-width:2px

Speed comparison:

Naive search: 50-100ms for 10K vectors
Vector DB (HNSW): 1-5ms for 10K vectors
10-50x faster!

And it scales: million vectors still only takes ~10-20ms.

Choosing a Vector Database

For our C# project, we need:

Good C# client library
Easy to deploy (Docker)
Good performance
Free / open source

Database	C# Support	Deployment	Performance	License
Qdrant	✅ Excellent	Docker	Very Fast	Apache 2.0
pgvector	✅ (via Npgsql)	Postgres extension	Fast	PostgreSQL License
Weaviate	✅ Good	Docker	Very Fast	BSD-3
Milvus	⚠️ Limited	Docker/K8s	Very Fast	Apache 2.0
Chroma	❌ Python-first	Docker	Fast	Apache 2.0

My choice: Qdrant

Excellent C# client (Qdrant.Client)
Simple Docker deployment
Built specifically for vector search (not bolted on)
Great documentation
Active development

Alternative: pgvector

We're already using PostgreSQL for the blog!
Could keep everything in one database
Slightly less performant but simpler architecture

For this series, I'll use Qdrant because it's purpose-built and easier to understand the concepts. But I'll show pgvector as an alternative.

Setting Up Qdrant

Docker Deployment

docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

Ports:

6333 - REST API
6334 - gRPC API (faster, we'll use this)

Storage: Persists data to ./qdrant_storage on host

C# Client Setup

dotnet add package Qdrant.Client --version 1.7.0

Creating a Collection

A collection is like a table - it holds vectors of a specific dimension.

using Qdrant.Client;
using Qdrant.Client.Grpc;

public class QdrantSetup
{
    private readonly QdrantClient _client;

    public QdrantSetup(string host = "localhost", int port = 6334)
    {
        _client = new QdrantClient(host, port);
    }

    public async Task CreateCollectionAsync(string collectionName, ulong vectorSize)
    {
        // Check if collection exists
        var collections = await _client.ListCollectionsAsync();
        if (collections.Any(c => c.Name == collectionName))
        {
            Console.WriteLine($"Collection '{collectionName}' already exists");
            return;
        }

        // Create collection
        await _client.CreateCollectionAsync(
            collectionName: collectionName,
            vectorsConfig: new VectorParams
            {
                Size = vectorSize,  // 768 for bge-base
                Distance = Distance.Cosine  // Cosine similarity
            }
        );

        Console.WriteLine($"Created collection '{collectionName}' with {vectorSize} dimensions");
    }
}

Key parameters:

Size - Must match your embedding model (768 for bge-base)
Distance - How to measure similarity:
- Distance.Cosine - Cosine similarity (most common)
- Distance.Euclid - Euclidean distance
- Distance.Dot - Dot product

Inserting Vectors

using Qdrant.Client.Grpc;
using System.Collections.Generic;

public class QdrantInserter
{
    private readonly QdrantClient _client;

    public QdrantInserter(QdrantClient client)
    {
        _client = client;
    }

    public async Task InsertBlogChunkAsync(
        string collectionName,
        ulong id,
        float[] embedding,
        string blogPostSlug,
        string chunkText,
        int chunkIndex)
    {
        var point = new PointStruct
        {
            Id = id,
            Vectors = embedding,
            Payload =
            {
                ["blog_post_slug"] = blogPostSlug,
                ["chunk_text"] = chunkText,
                ["chunk_index"] = chunkIndex,
                ["timestamp"] = DateTimeOffset.UtcNow.ToUnixTimeSeconds()
            }
        };

        await _client.UpsertAsync(collectionName, new[] { point });
    }

    public async Task InsertBatchAsync(
        string collectionName,
        List<(ulong id, float[] embedding, Dictionary<string, object> payload)> points)
    {
        var qdrantPoints = points.Select(p => new PointStruct
        {
            Id = p.id,
            Vectors = p.embedding,
            Payload = { p.payload }
        }).ToList();

        // Batch insert for efficiency
        await _client.UpsertAsync(collectionName, qdrantPoints);

        Console.WriteLine($"Inserted {points.Count} points");
    }
}

Payload explained:

Like metadata attached to each vector
Can store anything: post title, chunk text, date, categories
Searchable and filterable!

Batch insertion:

Much faster than one-by-one
Qdrant handles batches of 100-1000 efficiently

Searching Vectors

public class QdrantSearcher
{
    private readonly QdrantClient _client;

    public QdrantSearcher(QdrantClient client)
    {
        _client = client;
    }

    public async Task<List<SearchResult>> SearchAsync(
        string collectionName,
        float[] queryEmbedding,
        int topK = 10)
    {
        var searchResult = await _client.SearchAsync(
            collectionName: collectionName,
            vector: queryEmbedding,
            limit: (ulong)topK,
            scoreThreshold: 0.7f  // Only return if similarity > 0.7
        );

        return searchResult.Select(r => new SearchResult
        {
            Id = r.Id.Num,
            Score = r.Score,
            BlogPostSlug = r.Payload["blog_post_slug"].StringValue,
            ChunkText = r.Payload["chunk_text"].StringValue,
            ChunkIndex = (int)r.Payload["chunk_index"].IntegerValue
        }).ToList();
    }

    public async Task<List<SearchResult>> SearchWithFilterAsync(
        string collectionName,
        float[] queryEmbedding,
        string blogPostSlug,  // Only search within this post
        int topK = 5)
    {
        var filter = new Filter
        {
            Must =
            {
                new Condition
                {
                    Field = new FieldCondition
                    {
                        Key = "blog_post_slug",
                        Match = new Match { Keyword = blogPostSlug }
                    }
                }
            }
        };

        var searchResult = await _client.SearchAsync(
            collectionName: collectionName,
            vector: queryEmbedding,
            filter: filter,
            limit: (ulong)topK
        );

        return searchResult.Select(r => new SearchResult
        {
            Id = r.Id.Num,
            Score = r.Score,
            BlogPostSlug = r.Payload["blog_post_slug"].StringValue,
            ChunkText = r.Payload["chunk_text"].StringValue,
            ChunkIndex = (int)r.Payload["chunk_index"].IntegerValue
        }).ToList();
    }
}

public class SearchResult
{
    public ulong Id { get; set; }
    public float Score { get; set; }
    public string BlogPostSlug { get; set; }
    public string ChunkText { get; set; }
    public int ChunkIndex { get; set; }
}

Search features:

limit - Return top K most similar
scoreThreshold - Only return if similarity exceeds threshold
filter - Filter by metadata (e.g., only search specific posts)

Complete Example: Search Pipeline

using System;
using System.Threading.Tasks;

class Program
{
    static async Task Main(string[] args)
    {
        // Setup
        var embedder = new EmbeddingGenerator(
            "bge-base-en-onnx/model.onnx",
            "bge-base-en-onnx/tokenizer.json",
            useGpu: true
        );

        var client = new QdrantClient("localhost", 6334);
        var searcher = new QdrantSearcher(client);

        // User query
        string query = "How do I set up Docker with ASP.NET Core?";

        // Generate query embedding
        Console.WriteLine($"Searching for: {query}");
        var queryEmbedding = embedder.GenerateEmbedding(query);

        // Search
        var results = await searcher.SearchAsync(
            collectionName: "blog_embeddings",
            queryEmbedding: queryEmbedding,
            topK: 5
        );

        // Display results
        Console.WriteLine($"\nFound {results.Count} results:\n");

        foreach (var result in results)
        {
            Console.WriteLine($"Score: {result.Score:F4}");
            Console.WriteLine($"Post: {result.BlogPostSlug}");
            Console.WriteLine($"Chunk: {result.ChunkText.Substring(0, Math.Min(100, result.ChunkText.Length))}...");
            Console.WriteLine();
        }
    }
}

Output:

Searching for: How do I set up Docker with ASP.NET Core?

Found 5 results:

Score: 0.8923
Post: dockercomposedevdeps
Chunk: In this post, I'll show you how to set up a development environment using Docker Compose. This is p...

Score: 0.8654
Post: dockercompose
Chunk: Docker Compose is a tool for defining and running multi-container Docker applications. With Compose...

Score: 0.8102
Post: addingentityframeworkforblogpostspt1
Chunk: You can set it up either as a windows service or using Docker as I presented in a previous post on...

Score: 0.7891
Post: imagesharpwithdocker
Chunk: When running ASP.NET Core applications in Docker containers, you may encounter issues with ImageSha...

Score: 0.7654
Post: selfhostingseq
Chunk: I use Docker Compose to run all my services. Here's the relevant part of my docker-compose.yml file...

Perfect! It found relevant posts about Docker and ASP.NET Core, even though the exact phrase didn't appear.

Performance Optimization

Batch Embedding Generation

Don't generate embeddings one-by-one. Batch them!

public class BatchEmbeddingGenerator
{
    private readonly EmbeddingGenerator _embedder;

    public BatchEmbeddingGenerator(EmbeddingGenerator embedder)
    {
        _embedder = embedder;
    }

    public List<float[]> GenerateBatch(List<string> texts, int batchSize = 32)
    {
        var embeddings = new List<float[]>();

        for (int i = 0; i < texts.Count; i += batchSize)
        {
            var batch = texts.Skip(i).Take(batchSize).ToList();

            foreach (var text in batch)
            {
                embeddings.Add(_embedder.GenerateEmbedding(text));
            }

            Console.WriteLine($"Processed {Math.Min(i + batchSize, texts.Count)} / {texts.Count}");
        }

        return embeddings;
    }
}

Why batching?

GPU utilization: keeps GPU busy
Memory efficiency: reuses buffers
Progress tracking: user feedback

Caching Embeddings

Don't regenerate embeddings for unchanged content!

using System.Security.Cryptography;
using System.Text;

public class EmbeddingCache
{
    private readonly Dictionary<string, float[]> _cache = new();

    public float[] GetOrGenerate(string text, Func<string, float[]> generator)
    {
        string hash = ComputeHash(text);

        if (_cache.TryGetValue(hash, out var cached))
        {
            return cached;
        }

        var embedding = generator(text);
        _cache[hash] = embedding;

        return embedding;
    }

    private string ComputeHash(string text)
    {
        using var sha256 = SHA256.Create();
        var bytes = sha256.ComputeHash(Encoding.UTF8.GetBytes(text));
        return Convert.ToBase64String(bytes);
    }
}

pgvector Alternative

If you want to keep everything in PostgreSQL:

Install pgvector Extension

CREATE EXTENSION vector;

Create Table

CREATE TABLE blog_embeddings (
    id SERIAL PRIMARY KEY,
    blog_post_slug VARCHAR(255),
    chunk_text TEXT,
    chunk_index INT,
    embedding VECTOR(768)  -- 768 dimensions
);

-- Create HNSW index for fast search
CREATE INDEX ON blog_embeddings USING hnsw (embedding vector_cosine_ops);

Insert from C#

using Npgsql;
using Pgvector;

public async Task InsertEmbeddingAsync(
    string slug,
    string chunkText,
    int chunkIndex,
    float[] embedding)
{
    await using var conn = new NpgsqlConnection(connectionString);
    await conn.OpenAsync();

    await using var cmd = new NpgsqlCommand(
        "INSERT INTO blog_embeddings (blog_post_slug, chunk_text, chunk_index, embedding) VALUES ($1, $2, $3, $4)",
        conn
    )
    {
        Parameters =
        {
            new() { Value = slug },
            new() { Value = chunkText },
            new() { Value = chunkIndex },
            new() { Value = new Vector(embedding) }
        }
    };

    await cmd.ExecuteNonQueryAsync();
}

Search with pgvector

public async Task<List<SearchResult>> SearchAsync(float[] queryEmbedding, int topK = 10)
{
    await using var conn = new NpgsqlConnection(connectionString);
    await conn.OpenAsync();

    await using var cmd = new NpgsqlCommand(
        @"SELECT blog_post_slug, chunk_text, chunk_index,
                 1 - (embedding <=> $1) as similarity
          FROM blog_embeddings
          ORDER BY embedding <=> $1
          LIMIT $2",
        conn
    )
    {
        Parameters =
        {
            new() { Value = new Vector(queryEmbedding) },
            new() { Value = topK }
        }
    };

    var results = new List<SearchResult>();

    await using var reader = await cmd.ExecuteReaderAsync();
    while (await reader.ReadAsync())
    {
        results.Add(new SearchResult
        {
            BlogPostSlug = reader.GetString(0),
            ChunkText = reader.GetString(1),
            ChunkIndex = reader.GetInt32(2),
            Score = reader.GetFloat(3)
        });
    }

    return results;
}

<=> operator: Cosine distance (lower is more similar) 1 - distance: Convert to similarity score (higher is better)

Summary

We've covered:

✅ What embeddings are and why they enable semantic search
✅ How to choose and convert an embedding model to ONNX
✅ Generating embeddings in C# with ONNX Runtime
✅ Vector database concepts (HNSW, similarity search)
✅ Using Qdrant for vector storage and search
✅ Alternative: pgvector in PostgreSQL
✅ Performance optimization (batching, caching)

We now have the core technology to find semantically similar blog content!

What's Next?

In Part 4: Building the Ingestion Pipeline, we'll build the ingestion pipeline:

Reading and parsing markdown files (we already do this for the blog!)
Intelligent chunking strategies
Generating embeddings for all content
Bulk inserting into Qdrant
Handling updates and deletions
Monitoring progress

We'll process our entire blog corpus and have a searchable knowledge base!

Part 1: Introduction & Architecture
Part 2: GPU Setup & CUDA in C#
Part 3: Understanding Embeddings & Vector Databases (this post)
Part 4: Building the Ingestion Pipeline
Part 5: The Windows Client
Part 6: Local LLM Integration
Part 7: Content Generation & Prompt Engineering
Part 8: Advanced Features & Production Deployment

Resources

See you in Part 4!