# RAG для впровадження: SONX і Qdrantrant Symantic Suremantic Search Search з ONNX і Qdrant # Вступ **ведь часть серии RAG:** Це частина 4а - центральна реалізація: - [Частина 1: Походження та основи РАГ](/blog/rag-primer) - Що за вбудовування, чому вони мають значення? - [Частина 2: Архітектура і внутрішні властивості RAG](/blog/rag-architecture) - Розпакування, перевірка, векторні бази даних - [Частина 3: ПСГ на практиці](/blog/rag-practical-applications) - Будівельна повна система РАГ. - **Частина 4а: Реалізація ONNX і Qdrant** (цієї статті) Дружній до ЦП семантичний пошук - [Частина 4b: Семантичний пошук в дії](/blog/semantic-search-in-action) - Типагед, гібридні елементи пошуку і компоненти інтерфейсу користувача. - [Частина 5: Гібридний пошук і автоматичне інексування](/blog/rag-hybrid-search-and-indexing) - Шаблони інтеграції виробництва - [Частина 6: GraphRAG](/blog/graphrag-knowledge-graphs-for-rag) - Графіки знань для розуміння рівня корпусу Объясняет части 1-3 *чому* Семантичний пошук працює. У цій статті показано *як* будувати основу - a **нульова, дружня з ЦП реалізація** Використовується ONNX Runtime і Qdrant. [Частина 4b](/blog/semantic-search-in-action) обговорює UI та гібридну реалізацію пошуку, [Частина 5](/blog/rag-hybrid-search-and-indexing) описано автоіндексування у виробництві. **Трудність:** Більшість вирішень семантичних пошуків потребують дорогої інфраструктури GPU або дорогих керованих послуг. А що, якщо ви є розробником- інди, який працює блогом на скромній VPS? **Вирішення:** Повнофункціональна семантична система пошуку, яка працює повністю на процесорі за допомогою вільних інструментів з відкритим кодом. Це точне налаштування, запущене у цьому блозі - нульова додаткова вартість, яка перевищує існуючу вартість вузла. [TOC] # Основні концепції Ці концепції вкладені в глибині [Серія RAG](/blog/rag-primer), але ось що вам потрібно знати для цієї реалізації: ## Вбудовані: текст як Числа Вбудовані - це вектори (масив чисел), які захоплюють *означає* Подібні значення утворюють схожі вектори - це магія. ```mermaid graph TD A["Text: 'The cat sat on the mat'"] --> B[Embedding Model] B --> C["Vector: [0.25, -0.18, 0.91, ... 384 more numbers]"] D["Text: 'A feline rested on the carpet'"] --> B B --> E["Vector: [0.27, -0.16, 0.89, ... similar numbers!]"] C -.Similar vectors = similar meaning.-> E style A stroke:#10b981,stroke-width:2px style D stroke:#10b981,stroke-width:2px style B stroke:#6366f1,stroke-width:3px style C stroke:#f59e0b,stroke-width:2px style E stroke:#f59e0b,stroke-width:2px ``` **Прозорість ключа:** Тексти з подібними значеннями матимуть подібні вектори (емпель). Таким чином ми можемо знайти " пов' язаний " зміст - ми буквально вимірюємо відстань між значеннями! ### Розуміння подібності косину [Подібність Cosine](https://en.wikipedia.org/wiki/Cosine_similarity) вимірює кут між двома векторами - якщо вони вказують у однакових напрямках, вони семантично схожі: ```mermaid flowchart LR subgraph "Vector Space (simplified to 2D)" direction TB A["'Docker tutorial'"] -.-> B((0.85)) C["'Container deployment'"] -.-> B D["'Cooking recipes'"] -.-> E((0.12)) A -.-> E end B --> F["High Similarity
Related content!"] E --> G["Low Similarity
Different topics"] style A stroke:#10b981,stroke-width:2px style C stroke:#10b981,stroke-width:2px style D stroke:#f59e0b,stroke-width:2px style B stroke:#22c55e,stroke-width:3px style E stroke:#ef4444,stroke-width:3px style F stroke:#22c55e,stroke-width:2px style G stroke:#ef4444,stroke-width:2px ``` Формула: `similarity = (A · B) / (||A|| × ||B||)` - але з тих пір, як ми нормалізуватимемо наші вектори, це спрощується лише добуток точки! ## Що таке ONX? [ONNX (Open Neural Network Exchange)](https://onnx.ai/) є відкритим стандартним форматом моделей машинного навчання, які надають їм змогу ефективно працювати на різних платформах. Цей формат подібний до універсального перекладача для моделей комп' ютерного гравця. [ONNX Runtime](https://onnxruntime.ai/) Це високоефективний рушій оцінки Microsoft, який виконує ці моделі. **Why ONX for usage case:** - Виконується на процесорі (не потрібен GPU!) - див. [Документи постачальників виконання ЦП ONNX](https://onnxruntime.ai/docs/execution-providers/CPU-Execution-Provider.html) - Значно швидше, ніж запуск моделей на Python - Менша кількість відбитків пам' яті - Integates безперешкодно з.NET через [Microsoft. ML. OnxRuntime NuGet пакунок](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime) - Підтримка [Оптимізація графів](https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html) для швидких підрахунків ```mermaid flowchart LR subgraph "ONNX Inference Pipeline" A[Raw Text] --> B[Tokenizer] B --> C["Tokens: [CLS] the cat sat [SEP]"] C --> D[Token IDs: 101 1996 4937 2068 102] D --> E[ONNX Runtime] E --> F[384-dim Vector] F --> G[L2 Normalize] G --> H[Final Embedding] end style A stroke:#10b981,stroke-width:2px style B stroke:#f59e0b,stroke-width:2px style C stroke:#f59e0b,stroke-width:2px style D stroke:#f59e0b,stroke-width:2px style E stroke:#6366f1,stroke-width:3px style F stroke:#8b5cf6,stroke-width:2px style G stroke:#8b5cf6,stroke-width:2px style H stroke:#ef4444,stroke-width:2px ``` ## Що таке "Крант"? [Qdrant](https://qdrant.tech/) є векторною базою даних з відкритим кодом - загалом кажучи, базою даних, оптимізованою для зберігання і пошуку цих векторів. Глибоке занурення у концепції, налаштування Qdantin і C# див. [Бази даних векторів з самим собою з Qdrant](/blog/self-hosted-vector-databases-qdrant). Поки ти *може* вектори зберігання у PostgreSQL, Qdrant побудовано для цього і пропонує: - Швидкий пошук подібності блискавиці за допомогою [Алгоритм HNSW](https://qdrant.tech/documentation/concepts/indexing/#vector-index) - [Фільтр метаданих](https://qdrant.tech/documentation/concepts/filtering/) - фільтрувати результати за полями вантажу - Можливість масштабування для мільйонів векторів за допомогою [розподілене розповсюдження](https://qdrant.tech/documentation/guides/distributed_deployment/) - Низьке використання ресурсів - працює зручно на скромному обладнанні - Самовладання з Докером - дивіться [Qdrant Docker quickstart](https://qdrant.tech/documentation/quick-start/) - [gRPC і REST API](https://qdrant.tech/documentation/interfaces/) для інтеграції - Підтримка . NET за допомогою [Пакунок Qdrant. Client NuGe](https://www.nuget.org/packages/Qdrant.Client) ```mermaid flowchart TB subgraph "Qdrant Vector Storage" direction TB A[Collection: blog_posts] --> B[Point 1] A --> C[Point 2] A --> D[Point N...] B --> B1["Vector: [0.12, -0.08, ...]"] B --> B2["Payload: {slug, title, language}"] C --> C1["Vector: [0.25, 0.14, ...]"] C --> C2["Payload: {slug, title, language}"] end subgraph "Vector Search" E[Query Vector] --> F[HNSW Index] F --> G[Cosine Similarity] G --> H[Top K Results] end style A stroke:#ef4444,stroke-width:3px style B stroke:#8b5cf6,stroke-width:2px style C stroke:#8b5cf6,stroke-width:2px style D stroke:#8b5cf6,stroke-width:2px style B1 stroke:#f59e0b,stroke-width:2px style B2 stroke:#10b981,stroke-width:2px style C1 stroke:#f59e0b,stroke-width:2px style C2 stroke:#10b981,stroke-width:2px style E stroke:#6366f1,stroke-width:2px style F stroke:#ec4899,stroke-width:3px style G stroke:#ec4899,stroke-width:2px style H stroke:#10b981,stroke-width:2px ``` # Огляд архітектури Ось як наша семантична система пошуку співвідноситься: ```mermaid flowchart TB subgraph "Content Ingestion" A[Blog Post Markdown] --> B[Extract Plain Text] B --> C[ONNX Embedding Service] C --> D[Generate 384-dim Vector] D --> E[Qdrant Vector Store] end subgraph "Search Flow" F[User Query] --> G[ONNX Embedding Service] G --> H[Generate Query Vector] H --> I[Qdrant Search] E -.Vector Similarity.-> I I --> J[Ranked Results] end subgraph "Related Posts" K[Current Blog Post] --> L[Get Post Vector from Qdrant] L --> M[Find Similar Vectors] E -.->M M --> N[Top 5 Related Posts] end style A stroke:#10b981,stroke-width:2px style B stroke:#10b981,stroke-width:2px style C stroke:#6366f1,stroke-width:3px style D stroke:#f59e0b,stroke-width:2px style E stroke:#ef4444,stroke-width:3px style F stroke:#10b981,stroke-width:2px style G stroke:#6366f1,stroke-width:3px style H stroke:#f59e0b,stroke-width:2px style I stroke:#ef4444,stroke-width:2px style J stroke:#8b5cf6,stroke-width:2px style K stroke:#10b981,stroke-width:2px style L stroke:#ef4444,stroke-width:2px style M stroke:#ef4444,stroke-width:2px style N stroke:#8b5cf6,stroke-width:2px ``` **Потік простої англійської:** 1. **Індексування**: Коли ви пишете блог, ми перетворюємо його на вектор і зберігаємо в Qdrant 2. **Пошук**: Коли хтось шукає, ми перетворюємо їх запит на вектор і знаходимо подібні вектори у Qdrant 3. **Супутні повідомлення**: Для будь- якого допису блогу можна знайти інші дописи з подібними векторами # Структура проекту Ми створили чисту модульну структуру: ``` Mostlylucid.SemanticSearch/ ├── Config/ │ └── SemanticSearchConfig.cs # Configuration settings ├── Models/ │ ├── BlogPostDocument.cs # Document model for indexing │ └── SearchResult.cs # Search result model ├── Services/ │ ├── IEmbeddingService.cs # Embedding interface │ ├── OnnxEmbeddingService.cs # ONNX-based embeddings │ ├── IVectorStoreService.cs # Vector store interface │ ├── QdrantVectorStoreService.cs # Qdrant implementation │ ├── ISemanticSearchService.cs # High-level search interface │ └── SemanticSearchService.cs # Orchestration service ├── Extensions/ │ └── ServiceCollectionExtensions.cs # DI registration ├── download-models.sh # Model download script └── README.md ``` # Впровадження ## Крок 1: Налаштування проекту Спочатку створіть нову бібліотеку класів: ```bash dotnet new classlib -n Mostlylucid.SemanticSearch -f net9.0 dotnet sln add Mostlylucid.SemanticSearch ``` Додати потрібні пакунки NuGet: ```bash cd Mostlylucid.SemanticSearch dotnet add package Microsoft.Extensions.Logging.Abstractions dotnet add package Microsoft.ML.OnnxRuntime --version 1.21.1 dotnet add package Qdrant.Client --version 1.14.0 dotnet add reference ../Mostlylucid.Shared/Mostlylucid.Shared.csproj ``` ## Крок 2: налаштування Давайте встановимо наш клас конфігурації. `IConfigSection` шаблон, який використовується у більшості випадків: ```csharp using Mostlylucid.Shared.Config; namespace Mostlylucid.SemanticSearch.Config; ///

/// Configuration for semantic search functionality ///

public class SemanticSearchConfig : IConfigSection { public static string Section => "SemanticSearch"; ///

/// Enable or disable semantic search ///

public bool Enabled { get; set; } = true; ///

/// Qdrant server URL (e.g., http://localhost:6333) ///

public string QdrantUrl { get; set; } = "http://localhost:6333"; ///

/// Optional read-only API key for Qdrant (used for search operations) ///

public string? ReadApiKey { get; set; } ///

/// Optional read-write API key for Qdrant (used for indexing operations) ///

public string? WriteApiKey { get; set; } ///

/// Collection name in Qdrant for blog posts ///

public string CollectionName { get; set; } = "blog_posts"; ///

/// Path to the ONNX embedding model file ///

public string EmbeddingModelPath { get; set; } = "models/all-MiniLM-L6-v2.onnx"; ///

/// Path to the tokenizer vocabulary file ///

public string VocabPath { get; set; } = "models/vocab.txt"; ///

/// Embedding vector size (384 for all-MiniLM-L6-v2) ///

public int VectorSize { get; set; } = 384; ///

/// Number of related posts to return ///

public int RelatedPostsCount { get; set; } = 5; ///

/// Minimum similarity score (0-1) for related posts ///

public float MinimumSimilarityScore { get; set; } = 0.5f; ///

/// Number of search results to return ///

public int SearchResultsCount { get; set; } = 10; } ``` **Чому слід відокремлювати клавіші API?** Безпека! Ваш прочитаний ключ можна використовувати лише у кінцевих точках пошуку з відкритим доступом, а ваш ключ запису залишається на стороні лише для адміністративних операцій. Додати це до вашого `appsettings.json`: ```json { "SemanticSearch": { "Enabled": false, "QdrantUrl": "http://localhost:6333", "ReadApiKey": "", "WriteApiKey": "", "CollectionName": "blog_posts", "EmbeddingModelPath": "models/all-MiniLM-L6-v2.onnx", "VocabPath": "models/vocab.txt", "VectorSize": 384, "RelatedPostsCount": 5, "MinimumSimilarityScore": 0.5, "SearchResultsCount": 10 } } ``` ## Крок 3: Служба вбудовування без NX Ми використовуємо модель всіх MiniLM-L6-v2, яку спеціально розроблено для семантичної подібності, і вона ефективно працює на процесорі. **Чому ця модель?** - Малий розмір (~90MB) - Швидке визначення процесора (~50- 100 мс на вбудовування) - Вбудовування доброї якості (384 виміри) - Навчені понад 1 мільярд пар речень. Ось завершена реалізація: ```csharp using Microsoft.Extensions.Logging; using Microsoft.ML.OnnxRuntime; using Microsoft.ML.OnnxRuntime.Tensors; using Mostlylucid.SemanticSearch.Config; using System.Text.RegularExpressions; namespace Mostlylucid.SemanticSearch.Services; public class OnnxEmbeddingService : IEmbeddingService, IDisposable { private readonly ILogger _logger; private readonly SemanticSearchConfig _config; private readonly InferenceSession? _session; private readonly Dictionary _vocabulary; private readonly SemaphoreSlim _semaphore = new(1, 1); private bool _disposed; private const int MaxSequenceLength = 256; private const string PadToken = "[PAD]"; private const string UnkToken = "[UNK]"; private const string ClsToken = "[CLS]"; private const string SepToken = "[SEP]"; public OnnxEmbeddingService( ILogger logger, SemanticSearchConfig config) { _logger = logger; _config = config; _vocabulary = new Dictionary(); if (!_config.Enabled) { _logger.LogInformation("Semantic search is disabled"); return; } try { // Check if model file exists if (!File.Exists(_config.EmbeddingModelPath)) { _logger.LogWarning("Embedding model not found at {Path}. Semantic search will be disabled.", _config.EmbeddingModelPath); return; } // Load vocabulary if it exists if (File.Exists(_config.VocabPath)) { LoadVocabulary(_config.VocabPath); } // Create ONNX session with CPU execution provider var sessionOptions = new SessionOptions { ExecutionMode = ExecutionMode.ORT_SEQUENTIAL, GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL }; _session = new InferenceSession(_config.EmbeddingModelPath, sessionOptions); _logger.LogInformation("ONNX embedding model loaded successfully from {Path}", _config.EmbeddingModelPath); } catch (Exception ex) { _logger.LogError(ex, "Failed to initialize ONNX embedding service"); } } private void LoadVocabulary(string vocabPath) { var lines = File.ReadAllLines(vocabPath); for (int i = 0; i < lines.Length; i++) { var token = lines[i].Trim(); if (!string.IsNullOrEmpty(token)) { _vocabulary[token] = i; } } _logger.LogInformation("Loaded vocabulary with {Count} tokens", _vocabulary.Count); } public async Task GenerateEmbeddingAsync(string text, CancellationToken cancellationToken = default) { if (_session == null || !_config.Enabled) { return new float[_config.VectorSize]; } if (string.IsNullOrWhiteSpace(text)) { return new float[_config.VectorSize]; } // Use semaphore to prevent concurrent ONNX inference (not thread-safe) await _semaphore.WaitAsync(cancellationToken); try { return await Task.Run(() => GenerateEmbedding(text), cancellationToken); } finally { _semaphore.Release(); } } private float[] GenerateEmbedding(string text) { try { // Tokenize the input text var tokens = Tokenize(text); // Create input tensors for ONNX model var inputIds = CreateInputTensor(tokens, "input_ids"); var attentionMask = CreateAttentionMaskTensor(tokens.Length); var tokenTypeIds = CreateTokenTypeIdsTensor(tokens.Length); // Run inference var inputs = new List { NamedOnnxValue.CreateFromTensor("input_ids", inputIds), NamedOnnxValue.CreateFromTensor("attention_mask", attentionMask), NamedOnnxValue.CreateFromTensor("token_type_ids", tokenTypeIds) }; using var results = _session!.Run(inputs); // Extract the output tensor (sentence embedding) var output = results.First().AsTensor(); var embedding = output.ToArray(); // Normalize the vector (L2 normalization) return NormalizeVector(embedding); } catch (Exception ex) { _logger.LogError(ex, "Error generating embedding for text: {Text}", text[..Math.Min(100, text.Length)]); return new float[_config.VectorSize]; } } private List Tokenize(string text) { // Simple whitespace + punctuation tokenization var tokens = new List(); // Add [CLS] token at the start if (_vocabulary.TryGetValue(ClsToken, out var clsId)) tokens.Add(clsId); // Tokenize the text var words = Regex.Split(text.ToLowerInvariant(), @"(\W+)") .Where(w => !string.IsNullOrWhiteSpace(w)) .Take(MaxSequenceLength - 2); // Leave room for [CLS] and [SEP] foreach (var word in words) { if (_vocabulary.Count > 0) { if (_vocabulary.TryGetValue(word, out var tokenId)) tokens.Add(tokenId); else if (_vocabulary.TryGetValue(UnkToken, out var unkId)) tokens.Add(unkId); } else { // Fallback: use hash code as token ID tokens.Add(Math.Abs(word.GetHashCode()) % 30000); } } // Add [SEP] token at the end if (_vocabulary.TryGetValue(SepToken, out var sepId)) tokens.Add(sepId); return tokens; } private Tensor CreateInputTensor(List tokens, string name) { var length = Math.Min(tokens.Count, MaxSequenceLength); var tensorData = new long[1, MaxSequenceLength]; for (int i = 0; i < length; i++) { tensorData[0, i] = tokens[i]; } // Pad the rest var padId = _vocabulary.TryGetValue(PadToken, out var id) ? id : 0; for (int i = length; i < MaxSequenceLength; i++) { tensorData[0, i] = padId; } return new DenseTensor(tensorData, new[] { 1, MaxSequenceLength }); } private Tensor CreateAttentionMaskTensor(int actualLength) { var length = Math.Min(actualLength, MaxSequenceLength); var tensorData = new long[1, MaxSequenceLength]; for (int i = 0; i < length; i++) { tensorData[0, i] = 1; // Attend to actual tokens } return new DenseTensor(tensorData, new[] { 1, MaxSequenceLength }); } private Tensor CreateTokenTypeIdsTensor(int actualLength) { var tensorData = new long[1, MaxSequenceLength]; // All zeros for single sentence return new DenseTensor(tensorData, new[] { 1, MaxSequenceLength }); } private float[] NormalizeVector(float[] vector) { // L2 normalization var sumOfSquares = vector.Sum(v => v * v); var magnitude = MathF.Sqrt(sumOfSquares); if (magnitude > 0) { for (int i = 0; i < vector.Length; i++) { vector[i] /= magnitude; } } return vector; } public void Dispose() { if (_disposed) return; _session?.Dispose(); _semaphore?.Dispose(); _disposed = true; GC.SuppressFinalize(this); } } ``` **Точки ключів для молодших dev:** 1. **Рельєфація**: ми розбиваємо текст на менші частини, які може зрозуміти модель. 2. **Тензори**: Це багатовимірні масиви, з якими працюють моделі ONNX 3. **Маска уваги**: Повідомляє моделі, які частини вхідних даних є справжніми значеннями вмісту/ padding 4. **Нормалізація L2**Так, всі вектори мають однакову довжину, тому ми можемо порівняти їх справедливо. 5. **Semaphore**: Забезпечує безпеку гілки (типово, ONNX не є безпечним для гілки) ## Крок 4: Магазин векторів Qdrant Тепер давайте реалізуємо векторне зберігання і пошук: ```csharp using Microsoft.Extensions.Logging; using Mostlylucid.SemanticSearch.Config; using Mostlylucid.SemanticSearch.Models; using Qdrant.Client; using Qdrant.Client.Grpc; namespace Mostlylucid.SemanticSearch.Services; public class QdrantVectorStoreService : IVectorStoreService { private readonly ILogger _logger; private readonly SemanticSearchConfig _config; private readonly QdrantClient? _client; private bool _collectionInitialized; public QdrantVectorStoreService( ILogger logger, SemanticSearchConfig config) { _logger = logger; _config = config; if (!_config.Enabled) { _logger.LogInformation("Semantic search is disabled"); return; } try { var uri = new Uri(_config.QdrantUrl); var host = uri.Host; var port = uri.Port > 0 ? uri.Port : 6334; // Default gRPC port _client = new QdrantClient(host, port, https: uri.Scheme == "https"); _logger.LogInformation("Connected to Qdrant at {Host}:{Port}", host, port); } catch (Exception ex) { _logger.LogError(ex, "Failed to connect to Qdrant at {Url}", _config.QdrantUrl); } } public async Task InitializeCollectionAsync(CancellationToken cancellationToken = default) { if (_client == null || !_config.Enabled || _collectionInitialized) return; try { var collections = await _client.ListCollectionsAsync(cancellationToken); var collectionExists = collections.Any(c => c.Name == _config.CollectionName); if (!collectionExists) { _logger.LogInformation("Creating collection {CollectionName}", _config.CollectionName); await _client.CreateCollectionAsync( collectionName: _config.CollectionName, vectorsConfig: new VectorParams { Size = (ulong)_config.VectorSize, Distance = Distance.Cosine // Cosine similarity for semantic search }, cancellationToken: cancellationToken ); _logger.LogInformation("Collection {CollectionName} created successfully", _config.CollectionName); } _collectionInitialized = true; } catch (Exception ex) { _logger.LogError(ex, "Failed to initialize collection {CollectionName}", _config.CollectionName); throw; } } public async Task> FindRelatedPostsAsync( string slug, string language, int limit = 5, CancellationToken cancellationToken = default) { if (_client == null || !_config.Enabled) return new List(); try { // Find the document by slug and language var scrollResults = await _client.ScrollAsync( collectionName: _config.CollectionName, filter: new Filter { Must = { new Condition { Field = new FieldCondition { Key = "slug", Match = new Match { Keyword = slug } } }, new Condition { Field = new FieldCondition { Key = "language", Match = new Match { Keyword = language } } } } }, limit: 1, cancellationToken: cancellationToken ); var point = scrollResults.FirstOrDefault(); if (point == null) { _logger.LogWarning("Post {Slug} ({Language}) not found in vector store", slug, language); return new List(); } // Use the document's vector to find similar posts var searchResults = await _client.SearchAsync( collectionName: _config.CollectionName, vector: point.Vectors.Vector.Data.ToArray(), limit: (ulong)(limit + 1), // +1 because the first result will be the post itself scoreThreshold: _config.MinimumSimilarityScore, cancellationToken: cancellationToken ); // Filter out the original post and return top N similar posts return searchResults .Where(r => r.Payload["slug"].StringValue != slug || r.Payload["language"].StringValue != language) .Take(limit) .Select(result => new SearchResult { Slug = result.Payload["slug"].StringValue, Title = result.Payload["title"].StringValue, Language = result.Payload["language"].StringValue, Categories = result.Payload.TryGetValue("categories", out var cats) ? cats.ListValue.Values.Select(v => v.StringValue).ToList() : new List(), Score = result.Score, PublishedDate = DateTime.Parse(result.Payload["published_date"].StringValue) }) .ToList(); } catch (Exception ex) { _logger.LogError(ex, "Failed to find related posts for {Slug} ({Language})", slug, language); return new List(); } } // ... Additional methods for IndexDocument, Search, Delete, etc. } ``` **Що тут відбувається?** 1. **Косинна відстань**: ми використовуємо косичну подібність, яка є ідеальною для порівняння нормалізованих векторів 2. **Зберігання метаданих**: Qdrant дає нам змогу зберігати додаткові дані (сплати) поряд з векторами 3. **Фільтрування**: Ми можемо фільтрувати результати за метаданими перед порівнянням векторів 4. **Поріг рахунку**: Лише результат повернення, вищий за певний рахунок подібності ## Крок 5. Служіння за допомогою окуляцій Ця служба високого рівня пов'язує всі зв' язки: ```csharp using Microsoft.Extensions.Logging; using Mostlylucid.SemanticSearch.Config; using Mostlylucid.SemanticSearch.Models; using System.Security.Cryptography; using System.Text; namespace Mostlylucid.SemanticSearch.Services; public class SemanticSearchService : ISemanticSearchService { private readonly ILogger _logger; private readonly SemanticSearchConfig _config; private readonly IEmbeddingService _embeddingService; private readonly IVectorStoreService _vectorStoreService; public SemanticSearchService( ILogger logger, SemanticSearchConfig config, IEmbeddingService embeddingService, IVectorStoreService vectorStoreService) { _logger = logger; _config = config; _embeddingService = embeddingService; _vectorStoreService = vectorStoreService; } public async Task IndexPostAsync(BlogPostDocument document, CancellationToken cancellationToken = default) { if (!_config.Enabled) return; try { // Prepare text for embedding: combine title and content // We give more weight to the title by including it twice var textToEmbed = $"{document.Title}. {document.Title}. {document.Content}"; // Truncate to reasonable length (embedding models have token limits) const int maxLength = 2000; if (textToEmbed.Length > maxLength) { textToEmbed = textToEmbed[..maxLength]; } // Generate embedding var embedding = await _embeddingService.GenerateEmbeddingAsync(textToEmbed, cancellationToken); // Compute content hash if not provided if (string.IsNullOrEmpty(document.ContentHash)) { document.ContentHash = ComputeContentHash(document.Content); } // Store in vector database await _vectorStoreService.IndexDocumentAsync(document, embedding, cancellationToken); _logger.LogInformation("Indexed post {Slug} ({Language})", document.Slug, document.Language); } catch (Exception ex) { _logger.LogError(ex, "Failed to index post {Slug} ({Language})", document.Slug, document.Language); } } public async Task> SearchAsync( string query, int limit = 10, CancellationToken cancellationToken = default) { if (!_config.Enabled || string.IsNullOrWhiteSpace(query)) return new List(); try { // Generate embedding for the search query var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query, cancellationToken); // Search in vector store var results = await _vectorStoreService.SearchAsync( queryEmbedding, Math.Min(limit, _conken); _logger.LogDebug("Search for '{Query}' returned {Count} results", query, results.Count); return results; } catch (Exception ex) { _logger.LogError(ex, "Search failed for query '{Query}'", query); return new List(); } } public async Task> GetRelatedPostsAsync( string slug, string language, int limit = 5, CancellationToken cancellationToken = default) { if (!_config.Enabled) return new List(); try { var results = await _vectorStoreService.FindRelatedPostsAsync( slug, language, Math.Min(limit, _config.RelatedPostsCount), cancellationToken); _logger.LogDebug("Found {Count} related posts for {Slug} ({Language})", results.Count, slug, language); return results; } catch (Exception ex) { _logger.LogError(ex, "Failed to get related posts for {Slug} ({Language})", slug, language); return new List(); } } private string ComputeContentHash(string content) { using var sha256 = SHA256.Create(); var bytes = Encoding.UTF8.GetBytes(content); var hashBytes = sha256.ComputeHash(bytes); return Convert.ToBase64String(hashBytes); } } ``` ## Крок 6: Налаштування залежності від фіксації Зареєструвати все у контейнері DI: ```csharp using Microsoft.Extensions.Configuration; using Microsoft.Extensions.DependencyInjection; using Mostlylucid.SemanticSearch.Config; using Mostlylucid.SemanticSearch.Services; using Mostlylucid.Shared.Config; namespace Mostlylucid.SemanticSearch.Extensions; public static class ServiceCollectionExtensions { public static void AddSemanticSearch( this IServiceCollection services, IConfiguration configuration) { // Bind configuration using POCO pattern services.ConfigurePOCO( configuration.GetSection(SemanticSearchConfig.Section)); // Register services as singletons for efficiency services.AddSingleton(); services.AddSingleton(); services.AddSingleton(); } } ``` У вашій `Program.cs`: ```csharp using Mostlylucid.SemanticSearch.Extensions; using Mostlylucid.SemanticSearch.Services; // Add services services.AddSemanticSearch(config); // Initialize after building the app using (var scope = app.Services.CreateScope()) { var semanticSearch = scope.ServiceProvider.GetRequiredService(); await semanticSearch.InitializeAsync(); } ``` # Встановлення інфраструктури ## Докер Комбінація для Qdrant Створити окремий файл об' єднання для служб семантики пошуку: ```yaml version: '3.8' services: qdrant: image: qdrant/qdrant:latest container_name: mostlylucid-qdrant restart: unless-stopped ports: - "6333:6333" # HTTP API - "6334:6334" # gRPC API volumes: - qdrant_storage:/qdrant/storage environment: - QDRANT__SERVICE__HTTP_PORT=6333 - QDRANT__SERVICE__GRPC_PORT=6334 networks: - mostlylucid_network healthcheck: test: ["CMD", "curl", "-f", "http://localhost:6333/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s volumes: qdrant_storage: driver: local networks: mostlylucid_network: name: mostlylucidweb_app_network external: true ``` Почати з: ```bash docker-compose -f semantic-search-docker-compose.yml up -d ``` ## Звантажити вмонтовану модель Ми використовуємо [all- MiniLM- L6- v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) Модель грабежу [Перетворення речень](https://www.sbert.net/) Бібліотека. Цю модель спеціально тренують для семантичних задач подібності і створюють 384-вимірні вбудовування. ### Автоматичне звантаження (рекомендоване) Служба автоматично звантажує модель з Hback Face під час першого запуску, якщо такої моделі не існує: ```csharp // In OnnxEmbeddingService.cs private const string ModelUrl = "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx"; private const string VocabUrl = "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/vocab.txt"; public async Task EnsureInitializedAsync(CancellationToken cancellationToken = default) { if (_initialized || !_config.Enabled) return; // Download model if not exists if (!File.Exists(_config.EmbeddingModelPath)) { _logger.LogInformation("Downloading ONNX embedding model to {Path}...", _config.EmbeddingModelPath); await DownloadFileAsync(ModelUrl, _config.EmbeddingModelPath, cancellationToken); } // Download vocab if not exists if (!File.Exists(_config.VocabPath)) { _logger.LogInformation("Downloading vocabulary file to {Path}...", _config.VocabPath); await DownloadFileAsync(VocabUrl, _config.VocabPath, cancellationToken); } // Initialize ONNX session... } ``` Цей пункт особливо корисний під час запуску за допомогою Docker - ви можете накреслити гучність для каталогу моделей: ```yaml volumes: - ./mlmodels:/app/mlmodels # Model persists across container restarts ``` ### Ручне звантаження Крім того, ви можете звантажити вручну: ```bash chmod +x Mostlylucid.SemanticSearch/download-models.sh ./Mostlylucid.SemanticSearch/download-models.sh ``` Или прямо от Лица грабежа: ```bash mkdir -p mlmodels curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx -o mlmodels/all-MiniLM-L6-v2.onnx curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/vocab.txt -o mlmodels/vocab.txt ``` Ці звантаження: - `all-MiniLM-L6-v2.onnx` (~90 МБ) - [Модель вбудовування з ONNX](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/tree/main/onnx) - `vocab.txt` (~230КБ) - The [Словник калібру слів- маркерівKCharselect unicode block name](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/blob/main/vocab.txt) # Обмірковування швидкодії ## Покоління, що вмонтовується - **Швидкодія ЦП**: ~50- 100 мс на вбудовування сучасного процесора - **Оптимізація**: Ми використовуємо семфоре, щоб запобігти періодичному ONX-СКД. - **Пакетне**: Для максимального індексування обробки дописи у пакетах з 10- 20 ## Пошук векторів - **Швидкість пошуку**: < 10 мм для збірок до 100K векторів - **Використання пам' яті**: ~1KB на вектор (з метаданими) - **Масштабування**: Qdrant може працювати з мільйонами векторів на малому обладнанні ## Стратегія кешування Ми використовуємо вихідне кешування ядра ASP.NET: ```csharp [OutputCache(Duration = 7200, VaryByRouteValueNames = new[] {"slug", "language"})] ``` Це кешує відповідні дописи протягом 2 годин, значно зменшуючи навантаження. # Що ми збудували У цей момент у вас буде повноцінна база пошуку семантичних даних: - ✅ **Вбудовування ONX** - Дружній до процесора, автозйомки з Лиця грабежу - ✅ **Векторне сховище Qdrant** - Швидкий пошук подібності з фільтруванням метаданих - ✅ **Супутні дописи** - Знайти семантичний подібний вміст - ✅ **Інтерфейс пошуку** - Запити до рідної мови - ✅ **Індексування вмісту** - Зберігати дописи блогу як вектори **Це точне налаштування, запущене у цьому блозі** - Нуль GPU, нуль додаткових витрат. # Далі: семантичний пошук у дії Вхід [Частина 4b: Семантичний пошук в дії](/blog/semantic-search-in-action)Ми прикриваємо: - **Пошук у типахедах** - Як пошук працює з альпійськими.js - **Пошук у гібридному вигляді** - Об' єднання семантики + Повний текст PostgreSQL з Reciprocal Rank Fusion - **Інтерфейс пошуку** - Повна документація API з фільтрами - **Пов' язані Posts UI** - Компоненти DaisUI з безтурботним завантаженням HTMX - **Додаткові фільтри** - Відображення мови і діапазону дат **Продовжити до [Частина 4b](/blog/semantic-search-in-action) для реалізації інтерфейсу пошуку і гібридної реалізації пошуку.** Потім [Частина 5: Гібридний пошук і автоматичне інексування](/blog/rag-hybrid-search-and-indexing) призначено для побудови шаблонів інтеграції. # Ресурси ## Документація ONNX - [Офіційний сайт ONNX](https://onnx.ai/) - Відкритий стандарт моделей ML - [ONNX Runtime](https://onnxruntime.ai/) - Високоефективний рушій Microsoft - [ONNX Runtime. NET API](https://onnxruntime.ai/docs/api/csharp-api.html) - Документи C# API - [Швидкодія виконання ONNX](https://onnxruntime.ai/docs/performance/tune-performance/threading.html) - Інструкція оптимізації ## Документація Qdrant - [Бази даних векторів з самим собою з Qdrant](/blog/self-hosted-vector-databases-qdrant) - Глибоке занурення в концепції Qdrant і клієнт C# - [Офіційні документи Qdant](https://qdrant.tech/documentation/) - Головний центр документації - [Швидкий запуск Qdrant](https://qdrant.tech/documentation/quick-start/) - Починаємо. - [Індексування векторів Qdrant](https://qdrant.tech/documentation/concepts/indexing/#vector-index) - Алгоритм HNSW - [Клієнт Qdrant.NET](https://github.com/qdrant/qdrant-dotnet) - Офіціальний СДК. ## Вмонтовані моделі - [all- MiniLM- L6- v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - Модель, яку ми використовуємо. - [Перетворення речень](https://www.sbert.net/) - Бібліотека семантичних вбудовуваннях ## Повний код Всі коди наявні у: [gitub.com/ scottgal/ methlylucidweb](https://github.com/scottgal/mostlylucidweb) - `Mostlylucid.SemanticSearch/` - Основна бібліотека семантичного пошуку