This is Part 2 of the DocSummarizer series. See Part 1 for the architecture and patterns, or Part 3 for the deep technical dive into embeddings and retrieval.
Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.
Every claim is traceable. Every fact cites its source. Self-contained binary, runs entirely on your machine.
# Human-readable summary
docsummarizer -f contract.pdf
# JSON for agents/pipelines
docsummarizer tool -u "https://docs.example.com"
What this article covers: Installation, key modes (Auto/BertRag/Bert), templates, and common use cases.
What it doesn't cover: Full command reference, configuration options, troubleshooting, architecture details.
For complete documentation, see the README. For how it works internally, see Part 3.
Most summarizers give you text. This gives you evidence.
[chunk-N] citations back to source materialIf you need to trust a summary — or feed it to another system — that matters.
The tool command is designed specifically for integration with AI agents, MCP servers, and other automated systems. It outputs structured JSON to stdout with evidence-grounded claims - perfect for building RAG pipelines or agent tools.
# Summarize a URL and get JSON output
docsummarizer tool --url "https://example.com/docs.html"
# Summarize a local file
docsummarizer tool -f document.pdf
# With a focus query
docsummarizer tool -f contract.pdf -q "payment terms and conditions"
# Pipe to jq for processing
docsummarizer tool -f doc.pdf | jq '.summary.keyFacts'
The tool command returns structured JSON with evidence tracking:
{
"success": true,
"source": "https://example.com/docs.html",
"contentType": "text/html",
"summary": {
"executive": "Brief summary of the document.",
"keyFacts": [
{
"claim": "The system supports 10,000 TPS.",
"confidence": "high",
"evidence": ["chunk-3", "chunk-7"],
"type": "fact"
}
],
"topics": [
{
"name": "Architecture",
"summary": "The system uses microservices...",
"evidence": ["chunk-1", "chunk-2"]
}
],
"entities": {
"people": ["John Smith"],
"organizations": ["Acme Corp"],
"concepts": ["OAuth 2.0", "REST API"]
},
"openQuestions": ["What is the disaster recovery plan?"]
},
"metadata": {
"processingSeconds": 12.5,
"chunksProcessed": 15,
"model": "qwen2.5:1.5b",
"mode": "MapReduce",
"coverageScore": 0.95,
"citationRate": 1.2,
"fetchedAt": "2025-01-15T10:30:00Z"
}
}
docsummarizer tool [options]
| Option | Short | Description |
|---|---|---|
--url |
-u |
URL to fetch and summarize |
--file |
-f |
File to summarize |
--query |
-q |
Optional focus query |
--mode |
-m |
Summarization mode (Auto, BertRag, Bert, BertHybrid, MapReduce, Rag, Iterative) |
--model |
Ollama model to use | |
--config |
-c |
Configuration file path |
evidence IDs referencing source chunkshigh, medium, or low based on supporting evidenceexecutive summary has no citation markers for easy displaysuccess: false with an error messagePython script:
import subprocess
import json
result = subprocess.run(
["docsummarizer", "tool", "-u", "https://example.com/api-docs"],
capture_output=True, text=True
)
data = json.loads(result.stdout)
if data["success"]:
for fact in data["summary"]["keyFacts"]:
if fact["confidence"] == "high":
print(f"- {fact['claim']}")
Shell pipeline:
# Extract high-confidence facts only
docsummarizer tool -f doc.pdf | jq '[.summary.keyFacts[] | select(.confidence == "high")]'
# Get just the executive summary
docsummarizer tool -u "https://example.com" | jq -r '.summary.executive'
Pre-built native executables are available from GitHub Releases:
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | docsummarizer-win-x64.zip |
| Windows | ARM64 | docsummarizer-win-arm64.zip |
| Linux | x64 | docsummarizer-linux-x64.tar.gz |
| Linux | ARM64 | docsummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | docsummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | docsummarizer-osx-arm64.tar.gz |
# Download and extract (Linux/macOS)
curl -L -o docsummarizer.tar.gz https://github.com/scottgal/mostlylucidweb/releases/download/docsummarizer-v3.1.0/docsummarizer-linux-x64.tar.gz
tar -xzf docsummarizer.tar.gz
chmod +x docsummarizer
# Download and extract (Windows PowerShell)
Invoke-WebRequest -Uri "https://github.com/scottgal/mostlylucidweb/releases/download/docsummarizer-v3.1.0/docsummarizer-win-x64.zip" -OutFile "docsummarizer.zip"
Expand-Archive -Path "docsummarizer.zip" -DestinationPath "."
For pure extractive summarization, no external services required:
docsummarizer -f document.md -m Bert
ONNX models auto-download from HuggingFace on first use (~23MB). Returns in ~3-5 seconds.
For LLM-powered summarization, Ollama is required:
# Install Ollama from https://ollama.ai
ollama pull llama3.2:3b # Default model - good balance of speed/quality
ollama serve
Speed tip: For faster summaries (~3s vs ~15s), use
--model qwen2.5:1.5b
Required for PDF, DOCX, XLSX, PPTX, HTML, images (PNG/JPG/TIFF), CSV, VTT, and AsciiDoc files. Markdown and plain text files are read directly - no Docling required.
docker run -d -p 5001:5001 quay.io/docling-project/docling-serve
Not required by default - BertRag uses in-memory vectors. Enable Qdrant for persistent storage to avoid re-embedding documents on subsequent runs:
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant
Then configure in docsummarizer.json:
{
"bertRag": {
"vectorStore": "Qdrant",
"collectionName": "docsummarizer",
"persistVectors": true
}
}
If you prefer Ollama for embeddings instead of ONNX:
ollama pull nomic-embed-text # Or mxbai-embed-large
# Then use: --embedding-backend Ollama
docsummarizer check --verbose
Expected output shows a formatted table:
Dependency Status
╭─────────┬────────┬────────────────────────╮
│ Service │ Status │ Endpoint │
├─────────┼────────┼────────────────────────┤
│ Ollama │ OK │ http://localhost:11434 │
│ Docling │ Optional │ http://localhost:5001 │
│ Qdrant │ Optional │ localhost:6333 │
╰─────────┴────────┴────────────────────────╯
Default Model Info
╭────────────────┬────────────────╮
│ Property │ Value │
├────────────────┼────────────────┤
│ Name │ llama3.2:3b │
│ Family │ llama │
│ Parameters │ 3.2B │
│ Context Window │ 128,000 tokens │
╰────────────────┴────────────────╯
Ready to summarize! Ollama is available.
Note: Docling and Qdrant showing ✗ is fine for Markdown-only workflows.
Running docsummarizer with no arguments will:
README.md in the current directoryreadme.summary.md# Summarize README.md in current directory
docsummarizer
# Shows a formatted panel with:
# - Document info table (file, mode, model)
# - Progress indicators during processing
# - Summary panel with the result
# - Topics tree if available
# - Saved: readme.summary.md
# Just run it - Auto mode picks the best approach
docsummarizer -f document.pdf
# Fast mode - no LLM, pure extraction (~3-5s)
docsummarizer -f document.pdf -m Bert
# Production mode - best quality with validated citations
docsummarizer -f document.pdf -m BertRag
# Focused on specific topic
docsummarizer -f manual.pdf -m BertRag --focus "installation steps"
# Verbose progress
docsummarizer -f document.pdf -v
The tool evolved from "just MapReduce" to a full pipeline. Here's what each mode actually does:
Picks the right mode based on what you're asking for. Use this unless you have a reason not to.
docsummarizer -f doc.pdf
This is what you want for production. Three-phase pipeline:
docsummarizer -f doc.pdf -m BertRag
docsummarizer -f doc.pdf -m BertRag --focus "payment terms"
Why use it: Every claim traces back to a source segment. No hallucination. Scales to any document size. LLM only runs at the end (cheap).
Pure extraction using local ONNX models. No LLM call at all.
docsummarizer -f doc.pdf -m Bert
Why use it: Works offline. Returns in ~3-5 seconds. Deterministic (same input = same output). Good enough for quick scans.
BERT extracts, LLM polishes. Middle ground between Bert and BertRag.
docsummarizer -f doc.pdf -m BertHybrid
The original modes. Still work, but BertRag replaced them for most use cases.
docsummarizer -f doc.pdf -m MapReduce # Full coverage
docsummarizer -f doc.pdf -m Rag --focus "query" # Legacy focused mode
Instead of summarizing, ask questions about a document:
docsummarizer -f manual.pdf --query "How do I install the software?"
Summarize web pages directly without downloading:
# Summarize a web article
docsummarizer --url "https://example.com/article.html" --web-enabled
# Summarize a remote PDF
docsummarizer --url "https://example.com/document.pdf" --web-enabled
# With structured JSON extraction
docsummarizer --url "https://example.com/api-docs.html" --web-enabled --structured
Supported content: HTML (sanitized), PDF, Markdown, images (OCR), Office documents. Large images automatically resized.
Security: SSRF protection, DNS rebinding protection, content-type gating, decompression bomb protection, HTML sanitization.
JavaScript-rendered pages: Use --web-mode Playwright for SPAs and React apps (auto-installs Chromium on first use).
Extract machine-readable JSON instead of prose:
docsummarizer -f document.pdf --structured -o Json
Extracts: entities, functions, key flows, facts (with confidence levels), uncertainties, quotable passages.
# Use a template
docsummarizer -f doc.pdf --template executive
docsummarizer -f doc.pdf -t bullets
# Specify custom word count with template:wordcount syntax
docsummarizer -f doc.pdf -t bookreport:500
docsummarizer -f doc.pdf -t executive:100
# Or use --words to override any template's default
docsummarizer -f doc.pdf -t detailed --words 300
| Template | Words | Best For |
|---|---|---|
default |
~300 | Balanced summary with topics (2 paragraphs) |
prose |
~400 | Clean multi-paragraph prose - no metadata |
brief |
~50 | Quick 2-3 sentence summary |
oneliner |
~25 | Single sentence summary |
bullets |
auto | Bullet point list (5-7 items) |
executive |
~150 | Executive briefing with recommendations |
detailed |
~1000 | Comprehensive with full topics |
technical |
~350 | Technical docs with implementation details |
academic |
~250 | Academic abstract format |
citations |
auto | Key quotes with source citations only |
bookreport |
~500 | Book report (setting, characters, plot, themes) |
meeting |
~200 | Meeting notes (decisions, actions, questions) |
strict |
~60 | Token-efficient, 3 bullets max, no hedging |
To see all available templates with descriptions:
docsummarizer templates
Compare models on the same document using the benchmark subcommand:
docsummarizer benchmark -f doc.pdf -m "qwen2.5:1.5b,llama3.2:3b,ministral-3:3b"
The benchmark command parses the document once, then runs each model on the same chunks for fair comparison. Output shows timing, word count, and words/second for each model.
Process entire directories:
# Use BertRag for quality
docsummarizer -d ./documents -m BertRag -v
# Fast offline batch (no LLM needed)
docsummarizer -d ./documents -m Bert -o Json --output-dir ./summaries
# Process only PDFs recursively
docsummarizer -d ./documents -e .pdf --recursive -v
| Option | Short | Description | Default |
|---|---|---|---|
--file |
-f |
Path to document (DOCX, PDF, MD) | - |
--directory |
-d |
Path to directory for batch processing | - |
--url |
-u |
Web URL to fetch and summarize | - |
--web-enabled |
Enable web fetching (required for --url) | false |
|
--mode |
-m |
Summarization mode: Auto, BertRag, Bert, BertHybrid, MapReduce, Rag, Iterative | Auto |
--structured |
-s |
Use structured JSON extraction mode | false |
--focus |
Focus query for RAG mode | None | |
--query |
-q |
Query mode instead of summarization | None |
--model |
Ollama model to use | llama3.2:3b |
|
--verbose |
-v |
Show detailed progress with live UI | false |
--config |
-c |
Path to configuration file | Auto-discover |
--output-format |
-o |
Output format: Console, Text, Markdown, Json | Console |
--output-dir |
Output directory for file outputs | Current dir | |
--extensions |
-e |
File extensions for batch mode | All Docling formats |
--recursive |
-r |
Process directories recursively | false |
--template |
-t |
Summary template (default, brief, bullets, executive, etc.) | default |
--words |
-w |
Target word count (overrides template) | Template default |
| --embedding-backend | | Embedding backend: Onnx, Ollama | Onnx |
| --embedding-model | | ONNX model name (RAG mode) | AllMiniLmL6V2 |
| --web-mode | | Web fetch mode: Simple, Playwright | Simple |
| --analyze | -a | Run quality analysis on summary | false |
Best for comprehensive summaries with full document coverage.
docsummarizer -f document.pdf -m MapReduce -v
How it works:
Hierarchical Reduction for Long Documents:
For very long documents where the combined chunk summaries exceed the model's context window, MapReduce automatically uses hierarchical reduction:
100 chunks → 100 summaries → 5 batches → 5 intermediate summaries → final
This preserves full document coverage regardless of length - every chunk contributes to the final summary. The tool estimates tokens (~4 chars/token) and targets 60% context window utilization per reduction pass.
Pros: Fast, complete coverage, parallel processing, handles any document length Cons: May miss cross-section connections, slower for very long documents
Best when you need to focus on specific topics or have a targeted question.
docsummarizer -f document.pdf -m Rag --focus "pricing and payment terms" -v
How it works:
When to use RAG over MapReduce:
| Scenario | Best Mode |
|---|---|
| "Summarize this whole document" | MapReduce |
| "What does this say about security?" | RAG |
| 500-page manual, need everything | MapReduce (hierarchical) |
| 500-page manual, need specific section | RAG |
| Need fast results, don't have Qdrant | MapReduce |
RAG is not about handling long documents - MapReduce handles that with hierarchical reduction. RAG is about relevance filtering: when you want to ignore 90% of a document and focus on what matters to your specific question.
Pros: Topic-focused, semantic understanding, reuses index, faster for focused queries Cons: May miss content outside focus area, requires Qdrant, slower initial indexing
Best for narrative documents where context flows sequentially.
docsummarizer -f story.pdf -m Iterative -v
Warning: Slower and may lose context on long documents (>10 chunks).
| Document Type | Goal | Mode | Why |
|---|---|---|---|
| Technical spec (50+ pages) | Full summary | MapReduce | Complete coverage |
| Novel/Narrative | Full summary | MapReduce | Needs temporal context |
| Legal contract | Full summary | MapReduce | Can't miss clauses |
| Legal contract | "Payment terms?" | RAG | Focus on specific section |
| API docs (200 pages) | "How does auth work?" | RAG | Query specific topic |
| Research paper | Full summary | MapReduce | Structured, need everything |
| Content Type | Best Mode | Notes |
|---|---|---|
| Fiction/Narrative | MapReduce | Plot requires sequential context |
| Technical docs | Both | MapReduce for overview, RAG for specifics |
| Legal/Contracts | MapReduce | Every clause matters |
| Manuals | RAG | Usually querying for specifics |
| Document Size | MapReduce | RAG | Notes |
|---|---|---|---|
| 10 pages | 15s | 20s | Both fast |
| 50 pages | 45s | 30s | RAG faster if focused |
| 200 pages | 3-5 min | 1-2 min | Hierarchical reduction |
| 500+ pages | 10-15 min | 2-3 min | Consider multiple RAG queries |
docsummarizer config --output myconfig.json
Configuration is auto-discovered from:
--config optiondocsummarizer.json in current directory.docsummarizer.json (hidden file)~/.docsummarizer.json (user home)Example docsummarizer.json:
{
"embeddingBackend": "Onnx",
"onnx": {
"embeddingModel": "AllMiniLmL6V2"
},
"ollama": {
"model": "llama3.2:3b",
"embedModel": "mxbai-embed-large",
"baseUrl": "http://localhost:11434",
"temperature": 0.3,
"timeoutSeconds": 1200
},
"docling": {
"baseUrl": "http://localhost:5001",
"timeoutSeconds": 1200,
"pdfBackend": "pypdfium2",
"pagesPerChunk": 10,
"maxConcurrentChunks": 4,
"enableSplitProcessing": true
},
"qdrant": {
"host": "localhost",
"port": 6333,
"collectionName": "documents"
},
"processing": {
"maxHeadingLevel": 2,
"targetChunkTokens": 1500,
"minChunkTokens": 200,
"maxLlmParallelism": 2
},
"output": {
"format": "Console",
"verbose": false,
"includeTrace": false
},
"webFetch": {
"enabled": false,
"mode": "Simple",
"timeoutSeconds": 30,
"userAgent": "Mozilla/5.0 DocSummarizer/1.0"
},
"batch": {
"fileExtensions": [".pdf", ".docx", ".md", ".txt", ".html"],
"recursive": false,
"continueOnError": true
}
}
| Option | Default | Description |
|---|---|---|
maxLlmParallelism |
8 | Concurrent LLM requests (Ollama queues, so higher values just queue) |
maxHeadingLevel |
2 | Split on H1/H2 only. Set to 3 for finer granularity |
targetChunkTokens |
0 (auto) | Target chunk size. 0 = auto-calculate (~25% of context window) |
minChunkTokens |
0 (auto) | Minimum before merging. 0 = 1/8 of target |
## Executive Summary
- Key finding 1 with specific details [chunk-0]
- Important point 2 with numbers and dates [chunk-3]
- Critical requirement 3 [chunk-5]
## Section Highlights
- Introduction: Overview of the system architecture [chunk-0]
- Requirements: Technical specifications detailed [chunk-3]
...
## Open Questions
- What is the timeline for Phase 2?
- How does the fallback mechanism work?
### Trace
- Document: document.pdf
- Chunks: 12 total, 12 processed
- Topics: 5
- Time: 21.4s
- Coverage: 100%
- Citation rate: 1.20
Trace metrics: Coverage (% sections included), Citation rate (citations/bullet), Chunks processed (RAG may skip some).
| Model | Size | Speed | Quality | Use Case |
|---|---|---|---|---|
qwen2.5:1.5b |
986MB | Very Fast (~3s) | Good | Speed optimized |
gemma3:1b |
815MB | Fast (~10s) | Fair | Alternative small model |
llama3.2:3b |
2GB | Medium (~15s) | Very Good | Default - good balance |
ministral-3:3b |
2.9GB | Medium (~20s) | Very Good | Quality-focused |
llama3.1:8b |
4.7GB | Slow (~45s) | Excellent | High-quality summaries |
Tip: For faster summaries (~3s vs ~15s), use
--model qwen2.5:1.5b. For critical documents where quality matters more, use--model llama3.1:8b.
# Clone the repository
git clone https://github.com/scottgal/mostlylucidweb.git
cd mostlylucidweb/Mostlylucid.DocSummarizer
# Build
dotnet build
# Run
dotnet run -- --help
For production deployment without requiring .NET runtime installation:
# Build self-contained executable (Windows x64)
dotnet publish -c Release -r win-x64 --self-contained
# Build for Linux
dotnet publish -c Release -r linux-x64 --self-contained
# Build for macOS
dotnet publish -c Release -r osx-x64 --self-contained
Output: bin/Release/net9.0/<runtime>/publish/docsummarizer
ollama serveollama listdocker run -p 5001:5001 quay.io/docling-project/docling-serve--mode Rag)docker run -p 6333:6333 -p 6334:6334 qdrant/qdrantSymptoms: Bullet points echo the prompt ("Return only bullet points", "The rule is...") instead of summarizing content.
Cause: Model struggling with the prompt or content too long.
Fix: The default qwen2.5:1.5b handles most documents well. For problematic documents, try --model llama3.2:3b. See Model Recommendations.
If the summary seems generic or doesn't reference specific content:
Citation rate in trace output--mode Rag) which grounds summaries in retrieved chunks--verbose to see which chunks are being processedIf summaries lack [chunk-N] citations:
llama3.2:3bCitation rate in trace - higher values indicate better traceabilityqwen2.5:1.5b for speed, llama3.2:3b for balance, llama3.1:8b for qualitymaxLlmParallelism if experiencing timeouts© 2025 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.