Back to "DocSummarizer Part 2 - Using the Tool"

This is a viewer only at the moment see the article on how this works.

To update the preview hit Ctrl-Alt-R (or ⌘-Alt-R on Mac) or Enter to refresh. The Save icon lets you save the markdown file to disk

This is a preview from the server running through my markdig pipeline

AI C# Docling LLM Ollama Qdrant RAG Tools

DocSummarizer Part 2 - Using the Tool

Sunday, 21 December 2025

GitHub release .NET Version

This is Part 2 of the DocSummarizer series. See Part 1 for the architecture and patterns, or Part 3 for the deep technical dive into embeddings and retrieval.

Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.

Every claim is traceable. Every fact cites its source. Self-contained binary, runs entirely on your machine.

# Human-readable summary
docsummarizer -f contract.pdf

# JSON for agents/pipelines
docsummarizer tool -u "https://docs.example.com"

What this article covers: Installation, key modes (Auto/BertRag/Bert), templates, and common use cases.

What it doesn't cover: Full command reference, configuration options, troubleshooting, architecture details.

For complete documentation, see the README. For how it works internally, see Part 3.

Why This Exists

Most summarizers give you text. This gives you evidence.

  • Every claim includes [chunk-N] citations back to source material
  • Confidence levels (high/medium/low) based on supporting evidence
  • Structured JSON output for agent integration, CI pipelines, or MCP servers
  • Quality metrics catch hallucinations before they escape

If you need to trust a summary — or feed it to another system — that matters.

Features

  • 🚀 BertRag Pipeline: Production-grade BERT extraction → retrieval → LLM synthesis
  • 🤖 Auto Mode: Smart mode selection based on document and query
  • ⚡ Bert Mode: Pure extractive summarization - no LLM needed, works offline (~3-5s)
  • Evidence-Grounded Output: Citations, confidence levels, traceable claims
  • Multiple Modes: Auto, BertRag, Bert, BertHybrid, MapReduce, Rag, Iterative
  • Tool Mode: Clean JSON for LLM agents, MCP servers, CI checks
  • 13 Templates: default, prose, brief, oneliner, bullets, executive, detailed, technical, academic, citations, bookreport, meeting, strict
  • Large Documents: Handles 500+ pages with hierarchical processing
  • Web Fetching: Security-hardened (SSRF protection, HTML sanitization)
  • Playwright Mode: Headless browser for JavaScript-rendered pages (SPAs, React apps)
  • ONNX Embeddings: Zero-config local embeddings - models auto-download on first use
  • Quality Analysis: Hallucination detection, entity extraction
  • Resilient LLM: Polly-based retry with jitter backoff + circuit breaker
  • Local Only: Nothing leaves your machine

Using as an LLM Tool

The tool command is designed specifically for integration with AI agents, MCP servers, and other automated systems. It outputs structured JSON to stdout with evidence-grounded claims - perfect for building RAG pipelines or agent tools.

Basic Tool Usage

# Summarize a URL and get JSON output
docsummarizer tool --url "https://example.com/docs.html"

# Summarize a local file
docsummarizer tool -f document.pdf

# With a focus query
docsummarizer tool -f contract.pdf -q "payment terms and conditions"

# Pipe to jq for processing
docsummarizer tool -f doc.pdf | jq '.summary.keyFacts'

Tool Output Structure

The tool command returns structured JSON with evidence tracking:

{
  "success": true,
  "source": "https://example.com/docs.html",
  "contentType": "text/html",
  "summary": {
    "executive": "Brief summary of the document.",
    "keyFacts": [
      {
        "claim": "The system supports 10,000 TPS.",
        "confidence": "high",
        "evidence": ["chunk-3", "chunk-7"],
        "type": "fact"
      }
    ],
    "topics": [
      {
        "name": "Architecture",
        "summary": "The system uses microservices...",
        "evidence": ["chunk-1", "chunk-2"]
      }
    ],
    "entities": {
      "people": ["John Smith"],
      "organizations": ["Acme Corp"],
      "concepts": ["OAuth 2.0", "REST API"]
    },
    "openQuestions": ["What is the disaster recovery plan?"]
  },
  "metadata": {
    "processingSeconds": 12.5,
    "chunksProcessed": 15,
    "model": "qwen2.5:1.5b",
    "mode": "MapReduce",
    "coverageScore": 0.95,
    "citationRate": 1.2,
    "fetchedAt": "2025-01-15T10:30:00Z"
  }
}

Tool Command Options

docsummarizer tool [options]
Option Short Description
--url -u URL to fetch and summarize
--file -f File to summarize
--query -q Optional focus query
--mode -m Summarization mode (Auto, BertRag, Bert, BertHybrid, MapReduce, Rag, Iterative)
--model Ollama model to use
--config -c Configuration file path

Key Design Principles

  • Evidence Grounding: Every claim includes evidence IDs referencing source chunks
  • Confidence Levels: Claims are rated high, medium, or low based on supporting evidence
  • Clean Output: The executive summary has no citation markers for easy display
  • Metadata: Processing stats help with debugging and quality assessment
  • Error Handling: Failures return success: false with an error message

Integration Examples

Python script:

import subprocess
import json

result = subprocess.run(
    ["docsummarizer", "tool", "-u", "https://example.com/api-docs"],
    capture_output=True, text=True
)
data = json.loads(result.stdout)

if data["success"]:
    for fact in data["summary"]["keyFacts"]:
        if fact["confidence"] == "high":
            print(f"- {fact['claim']}")

Shell pipeline:

# Extract high-confidence facts only
docsummarizer tool -f doc.pdf | jq '[.summary.keyFacts[] | select(.confidence == "high")]'

# Get just the executive summary
docsummarizer tool -u "https://example.com" | jq -r '.summary.executive'

Quick Start

Download Pre-built Binaries

Pre-built native executables are available from GitHub Releases:

Platform Architecture Download
Windows x64 docsummarizer-win-x64.zip
Windows ARM64 docsummarizer-win-arm64.zip
Linux x64 docsummarizer-linux-x64.tar.gz
Linux ARM64 docsummarizer-linux-arm64.tar.gz
macOS x64 (Intel) docsummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) docsummarizer-osx-arm64.tar.gz
# Download and extract (Linux/macOS)
curl -L -o docsummarizer.tar.gz https://github.com/scottgal/mostlylucidweb/releases/download/docsummarizer-v3.1.0/docsummarizer-linux-x64.tar.gz
tar -xzf docsummarizer.tar.gz
chmod +x docsummarizer

# Download and extract (Windows PowerShell)
Invoke-WebRequest -Uri "https://github.com/scottgal/mostlylucidweb/releases/download/docsummarizer-v3.1.0/docsummarizer-win-x64.zip" -OutFile "docsummarizer.zip"
Expand-Archive -Path "docsummarizer.zip" -DestinationPath "."

Prerequisites

Bert Mode (No External Services)

For pure extractive summarization, no external services required:

docsummarizer -f document.md -m Bert

ONNX models auto-download from HuggingFace on first use (~23MB). Returns in ~3-5 seconds.

LLM Modes (Auto, BertRag, MapReduce, etc.)

For LLM-powered summarization, Ollama is required:

# Install Ollama from https://ollama.ai
ollama pull llama3.2:3b        # Default model - good balance of speed/quality
ollama serve

Speed tip: For faster summaries (~3s vs ~15s), use --model qwen2.5:1.5b

Optional: Docling (Binary Formats)

Required for PDF, DOCX, XLSX, PPTX, HTML, images (PNG/JPG/TIFF), CSV, VTT, and AsciiDoc files. Markdown and plain text files are read directly - no Docling required.

docker run -d -p 5001:5001 quay.io/docling-project/docling-serve

Optional: Qdrant (Persistent Vector Storage)

Not required by default - BertRag uses in-memory vectors. Enable Qdrant for persistent storage to avoid re-embedding documents on subsequent runs:

docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant

Then configure in docsummarizer.json:

{
  "bertRag": {
    "vectorStore": "Qdrant",
    "collectionName": "docsummarizer",
    "persistVectors": true
  }
}

Optional: Ollama Embeddings

If you prefer Ollama for embeddings instead of ONNX:

ollama pull nomic-embed-text   # Or mxbai-embed-large
# Then use: --embedding-backend Ollama

Verify Dependencies

docsummarizer check --verbose

Expected output shows a formatted table:

              Dependency Status              
╭─────────┬────────┬────────────────────────╮
│ Service │ Status │ Endpoint               │
├─────────┼────────┼────────────────────────┤
│ Ollama  │   OK   │ http://localhost:11434 │
│ Docling │ Optional │ http://localhost:5001 │
│ Qdrant  │ Optional │ localhost:6333        │
╰─────────┴────────┴────────────────────────╯

        Default Model Info         
╭────────────────┬────────────────╮
│ Property       │ Value          │
├────────────────┼────────────────┤
│ Name           │ llama3.2:3b    │
│ Family         │ llama          │
│ Parameters     │ 3.2B           │
│ Context Window │ 128,000 tokens │
╰────────────────┴────────────────╯

Ready to summarize! Ollama is available.

Note: Docling and Qdrant showing ✗ is fine for Markdown-only workflows.

Usage

Default Behavior

Running docsummarizer with no arguments will:

  1. Look for README.md in the current directory
  2. Summarize it using Auto mode (smart mode selection)
  3. Print the summary to console with a nice panel UI
  4. Auto-save to readme.summary.md
# Summarize README.md in current directory
docsummarizer

# Shows a formatted panel with:
# - Document info table (file, mode, model)
# - Progress indicators during processing
# - Summary panel with the result
# - Topics tree if available
# - Saved: readme.summary.md

Basic Summarization

# Just run it - Auto mode picks the best approach
docsummarizer -f document.pdf

# Fast mode - no LLM, pure extraction (~3-5s)
docsummarizer -f document.pdf -m Bert

# Production mode - best quality with validated citations
docsummarizer -f document.pdf -m BertRag

# Focused on specific topic
docsummarizer -f manual.pdf -m BertRag --focus "installation steps"

# Verbose progress
docsummarizer -f document.pdf -v

Summarization Modes

The tool evolved from "just MapReduce" to a full pipeline. Here's what each mode actually does:

Auto (Default)

Picks the right mode based on what you're asking for. Use this unless you have a reason not to.

docsummarizer -f doc.pdf

BertRag (Production)

This is what you want for production. Three-phase pipeline:

  1. Extract - Parse the document into segments, embed them with BERT
  2. Retrieve - Find the relevant segments (semantic search + salience scoring)
  3. Synthesize - LLM writes a fluent summary from those segments
docsummarizer -f doc.pdf -m BertRag
docsummarizer -f doc.pdf -m BertRag --focus "payment terms"

Why use it: Every claim traces back to a source segment. No hallucination. Scales to any document size. LLM only runs at the end (cheap).

Bert (Fast, No LLM)

Pure extraction using local ONNX models. No LLM call at all.

docsummarizer -f doc.pdf -m Bert

Why use it: Works offline. Returns in ~3-5 seconds. Deterministic (same input = same output). Good enough for quick scans.

BertHybrid

BERT extracts, LLM polishes. Middle ground between Bert and BertRag.

docsummarizer -f doc.pdf -m BertHybrid

MapReduce / Rag / Iterative

The original modes. Still work, but BertRag replaced them for most use cases.

  • MapReduce: Parallel chunking, good for 100% coverage
  • Rag: Vector search, good for focused queries (legacy - BertRag does this better)
  • Iterative: Sequential processing, only use for tiny docs
docsummarizer -f doc.pdf -m MapReduce  # Full coverage
docsummarizer -f doc.pdf -m Rag --focus "query"  # Legacy focused mode

Query Mode

Instead of summarizing, ask questions about a document:

docsummarizer -f manual.pdf --query "How do I install the software?"

Web URL Fetching

Summarize web pages directly without downloading:

# Summarize a web article
docsummarizer --url "https://example.com/article.html" --web-enabled

# Summarize a remote PDF
docsummarizer --url "https://example.com/document.pdf" --web-enabled

# With structured JSON extraction
docsummarizer --url "https://example.com/api-docs.html" --web-enabled --structured

Supported content: HTML (sanitized), PDF, Markdown, images (OCR), Office documents. Large images automatically resized.

Security: SSRF protection, DNS rebinding protection, content-type gating, decompression bomb protection, HTML sanitization.

JavaScript-rendered pages: Use --web-mode Playwright for SPAs and React apps (auto-installs Chromium on first use).

Structured Mode

Extract machine-readable JSON instead of prose:

docsummarizer -f document.pdf --structured -o Json

Extracts: entities, functions, key flows, facts (with confidence levels), uncertainties, quotable passages.

Summary Templates

# Use a template
docsummarizer -f doc.pdf --template executive
docsummarizer -f doc.pdf -t bullets

# Specify custom word count with template:wordcount syntax
docsummarizer -f doc.pdf -t bookreport:500
docsummarizer -f doc.pdf -t executive:100

# Or use --words to override any template's default
docsummarizer -f doc.pdf -t detailed --words 300
Template Words Best For
default ~300 Balanced summary with topics (2 paragraphs)
prose ~400 Clean multi-paragraph prose - no metadata
brief ~50 Quick 2-3 sentence summary
oneliner ~25 Single sentence summary
bullets auto Bullet point list (5-7 items)
executive ~150 Executive briefing with recommendations
detailed ~1000 Comprehensive with full topics
technical ~350 Technical docs with implementation details
academic ~250 Academic abstract format
citations auto Key quotes with source citations only
bookreport ~500 Book report (setting, characters, plot, themes)
meeting ~200 Meeting notes (decisions, actions, questions)
strict ~60 Token-efficient, 3 bullets max, no hedging

To see all available templates with descriptions:

docsummarizer templates

Model Benchmarking

Compare models on the same document using the benchmark subcommand:

docsummarizer benchmark -f doc.pdf -m "qwen2.5:1.5b,llama3.2:3b,ministral-3:3b"

The benchmark command parses the document once, then runs each model on the same chunks for fair comparison. Output shows timing, word count, and words/second for each model.

Batch Processing

Process entire directories:

# Use BertRag for quality
docsummarizer -d ./documents -m BertRag -v

# Fast offline batch (no LLM needed)
docsummarizer -d ./documents -m Bert -o Json --output-dir ./summaries

# Process only PDFs recursively
docsummarizer -d ./documents -e .pdf --recursive -v

Command-Line Options

Option Short Description Default
--file -f Path to document (DOCX, PDF, MD) -
--directory -d Path to directory for batch processing -
--url -u Web URL to fetch and summarize -
--web-enabled Enable web fetching (required for --url) false
--mode -m Summarization mode: Auto, BertRag, Bert, BertHybrid, MapReduce, Rag, Iterative Auto
--structured -s Use structured JSON extraction mode false
--focus Focus query for RAG mode None
--query -q Query mode instead of summarization None
--model Ollama model to use llama3.2:3b
--verbose -v Show detailed progress with live UI false
--config -c Path to configuration file Auto-discover
--output-format -o Output format: Console, Text, Markdown, Json Console
--output-dir Output directory for file outputs Current dir
--extensions -e File extensions for batch mode All Docling formats
--recursive -r Process directories recursively false
--template -t Summary template (default, brief, bullets, executive, etc.) default
--words -w Target word count (overrides template) Template default

| --embedding-backend | | Embedding backend: Onnx, Ollama | Onnx | | --embedding-model | | ONNX model name (RAG mode) | AllMiniLmL6V2 | | --web-mode | | Web fetch mode: Simple, Playwright | Simple | | --analyze | -a | Run quality analysis on summary | false |

Summarization Modes

Best for comprehensive summaries with full document coverage.

docsummarizer -f document.pdf -m MapReduce -v

How it works:

  1. Splits document into structural chunks (by headings)
  2. Summarizes each chunk in parallel using LLM
  3. Reduces summaries into executive summary with citations
  4. Validates all citations reference real chunks

Hierarchical Reduction for Long Documents:

For very long documents where the combined chunk summaries exceed the model's context window, MapReduce automatically uses hierarchical reduction:

  1. Batching: Groups summaries into batches that fit in context
  2. Intermediate reduction: Reduces each batch to a condensed summary
  3. Final reduction: Merges intermediate summaries into the final output
  4. Recursive: If intermediates are still too large, adds more levels
100 chunks → 100 summaries → 5 batches → 5 intermediate summaries → final

This preserves full document coverage regardless of length - every chunk contributes to the final summary. The tool estimates tokens (~4 chars/token) and targets 60% context window utilization per reduction pass.

Pros: Fast, complete coverage, parallel processing, handles any document length Cons: May miss cross-section connections, slower for very long documents

RAG (Best for Focused Queries)

Best when you need to focus on specific topics or have a targeted question.

docsummarizer -f document.pdf -m Rag --focus "pricing and payment terms" -v

How it works:

  1. Indexes document chunks as vector embeddings in Qdrant
  2. Extracts key topics from document headings
  3. Retrieves relevant chunks per topic using semantic search
  4. Synthesizes focused summary with citations

When to use RAG over MapReduce:

Scenario Best Mode
"Summarize this whole document" MapReduce
"What does this say about security?" RAG
500-page manual, need everything MapReduce (hierarchical)
500-page manual, need specific section RAG
Need fast results, don't have Qdrant MapReduce

RAG is not about handling long documents - MapReduce handles that with hierarchical reduction. RAG is about relevance filtering: when you want to ignore 90% of a document and focus on what matters to your specific question.

Pros: Topic-focused, semantic understanding, reuses index, faster for focused queries Cons: May miss content outside focus area, requires Qdrant, slower initial indexing

Iterative

Best for narrative documents where context flows sequentially.

docsummarizer -f story.pdf -m Iterative -v

Warning: Slower and may lose context on long documents (>10 chunks).

Large Document Guide

Choosing the Right Mode

Document Type Goal Mode Why
Technical spec (50+ pages) Full summary MapReduce Complete coverage
Novel/Narrative Full summary MapReduce Needs temporal context
Legal contract Full summary MapReduce Can't miss clauses
Legal contract "Payment terms?" RAG Focus on specific section
API docs (200 pages) "How does auth work?" RAG Query specific topic
Research paper Full summary MapReduce Structured, need everything

Fiction vs Non-Fiction

Content Type Best Mode Notes
Fiction/Narrative MapReduce Plot requires sequential context
Technical docs Both MapReduce for overview, RAG for specifics
Legal/Contracts MapReduce Every clause matters
Manuals RAG Usually querying for specifics

Performance

Document Size MapReduce RAG Notes
10 pages 15s 20s Both fast
50 pages 45s 30s RAG faster if focused
200 pages 3-5 min 1-2 min Hierarchical reduction
500+ pages 10-15 min 2-3 min Consider multiple RAG queries

Configuration

Generate Default Configuration

docsummarizer config --output myconfig.json

Configuration File

Configuration is auto-discovered from:

  1. --config option
  2. docsummarizer.json in current directory
  3. .docsummarizer.json (hidden file)
  4. ~/.docsummarizer.json (user home)

Example docsummarizer.json:

{
  "embeddingBackend": "Onnx",
  "onnx": {
    "embeddingModel": "AllMiniLmL6V2"
  },
  "ollama": {
    "model": "llama3.2:3b",
    "embedModel": "mxbai-embed-large",
    "baseUrl": "http://localhost:11434",
    "temperature": 0.3,
    "timeoutSeconds": 1200
  },
  "docling": {
    "baseUrl": "http://localhost:5001",
    "timeoutSeconds": 1200,
    "pdfBackend": "pypdfium2",
    "pagesPerChunk": 10,
    "maxConcurrentChunks": 4,
    "enableSplitProcessing": true
  },
  "qdrant": {
    "host": "localhost",
    "port": 6333,
    "collectionName": "documents"
  },
  "processing": {
    "maxHeadingLevel": 2,
    "targetChunkTokens": 1500,
    "minChunkTokens": 200,
    "maxLlmParallelism": 2
  },
  "output": {
    "format": "Console",
    "verbose": false,
    "includeTrace": false
  },
  "webFetch": {
    "enabled": false,
    "mode": "Simple",
    "timeoutSeconds": 30,
    "userAgent": "Mozilla/5.0 DocSummarizer/1.0"
  },
  "batch": {
    "fileExtensions": [".pdf", ".docx", ".md", ".txt", ".html"],
    "recursive": false,
    "continueOnError": true
  }
}

Processing Options

Option Default Description
maxLlmParallelism 8 Concurrent LLM requests (Ollama queues, so higher values just queue)
maxHeadingLevel 2 Split on H1/H2 only. Set to 3 for finer granularity
targetChunkTokens 0 (auto) Target chunk size. 0 = auto-calculate (~25% of context window)
minChunkTokens 0 (auto) Minimum before merging. 0 = 1/8 of target

Output Format

Summary Structure

## Executive Summary
- Key finding 1 with specific details [chunk-0]
- Important point 2 with numbers and dates [chunk-3]
- Critical requirement 3 [chunk-5]

## Section Highlights
- Introduction: Overview of the system architecture [chunk-0]
- Requirements: Technical specifications detailed [chunk-3]
...

## Open Questions
- What is the timeline for Phase 2?
- How does the fallback mechanism work?

### Trace

- Document: document.pdf
- Chunks: 12 total, 12 processed
- Topics: 5
- Time: 21.4s
- Coverage: 100%
- Citation rate: 1.20

Trace metrics: Coverage (% sections included), Citation rate (citations/bullet), Chunks processed (RAG may skip some).

Model Recommendations

Model Size Speed Quality Use Case
qwen2.5:1.5b 986MB Very Fast (~3s) Good Speed optimized
gemma3:1b 815MB Fast (~10s) Fair Alternative small model
llama3.2:3b 2GB Medium (~15s) Very Good Default - good balance
ministral-3:3b 2.9GB Medium (~20s) Very Good Quality-focused
llama3.1:8b 4.7GB Slow (~45s) Excellent High-quality summaries

Tip: For faster summaries (~3s vs ~15s), use --model qwen2.5:1.5b. For critical documents where quality matters more, use --model llama3.1:8b.

Build from Source

# Clone the repository
git clone https://github.com/scottgal/mostlylucidweb.git
cd mostlylucidweb/Mostlylucid.DocSummarizer

# Build
dotnet build

# Run
dotnet run -- --help

Self-Contained Builds

For production deployment without requiring .NET runtime installation:

# Build self-contained executable (Windows x64)
dotnet publish -c Release -r win-x64 --self-contained

# Build for Linux
dotnet publish -c Release -r linux-x64 --self-contained

# Build for macOS
dotnet publish -c Release -r osx-x64 --self-contained

Output: bin/Release/net9.0/<runtime>/publish/docsummarizer

Troubleshooting

"Could not connect to Ollama"

  • Ensure Ollama is running: ollama serve
  • Check models are pulled: ollama list

"Docling service unavailable"

  • This is only required for PDF/DOCX files
  • For Markdown files, you can ignore this error
  • To fix: docker run -p 5001:5001 quay.io/docling-project/docling-serve

"Qdrant connection failed"

  • This is only required for RAG mode (--mode Rag)
  • For MapReduce mode (default), you can ignore this error
  • To fix: docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

"Circuit breaker is open"

  • Ollama is overloaded or has crashed
  • Wait 30 seconds for the circuit breaker to reset, or restart Ollama
  • The tool uses Polly resilience policies and will auto-retry

"wsarecv" or Connection Errors (Windows)

  • This is a known Ollama issue on Windows (GitHub #13340)
  • The tool auto-handles this with retry logic and connection recovery
  • If persistent, restart Ollama and try again

"LLM generation timed out"

  • Increase timeout in configuration
  • Split very large documents
  • Check Ollama is not overloaded with other requests

Repetitive or Low-Quality Summaries

Symptoms: Bullet points echo the prompt ("Return only bullet points", "The rule is...") instead of summarizing content.

Cause: Model struggling with the prompt or content too long.

Fix: The default qwen2.5:1.5b handles most documents well. For problematic documents, try --model llama3.2:3b. See Model Recommendations.

Summary Ignores Document Content

If the summary seems generic or doesn't reference specific content:

  • The model may be hallucinating - check Citation rate in trace output
  • Try RAG mode (--mode Rag) which grounds summaries in retrieved chunks
  • Use --verbose to see which chunks are being processed

Citations Missing or Invalid

If summaries lack [chunk-N] citations:

  • Small models prioritize content over citation formatting
  • The prompts are optimized for speed, not strict citation compliance
  • For strict citations, use larger models like llama3.2:3b
  • Check Citation rate in trace - higher values indicate better traceability

Performance Tips

  • MapReduce for speed (parallel chunks)
  • qwen2.5:1.5b for speed, llama3.2:3b for balance, llama3.1:8b for quality
  • ONNX embeddings (default) are faster than Ollama for RAG mode
  • Lower maxLlmParallelism if experiencing timeouts

Resources

Series Navigation

logo

© 2025 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.