One of the challenges I faced while building this blog was how to efficiently include external markdown content without manually copying and pasting it everywhere. I wanted to fetch README files from my GitHub repositories, include documentation from other projects, and keep everything automatically synchronized. The solution? A custom Markdig extension that fetches remote markdown at render time and caches it intelligently.
In this post, I'll walk you through how I built Mostlylucid.Markdig.FetchExtension - a complete solution for fetching and caching remote markdown content with support for multiple storage backends, automatic polling, and a stale-while-revalidate caching pattern.
NOTE: This is still prerelease but i wanted to get it out there. Have fun but it might not work yet.
This article is AI generated - using claude code which also helped me build the feature.
UPDATE: Added
disable="true"parameter so we can now demo the tags properly without them being processed!
UPDATE (Nov 7, 2025): Added Table of Contents (TOC) generation feature! Use
[TOC]in your markdown to automatically generate a clickable table of contents from document headings.
See the source for it here on the GitHub for this site.
Before diving into the technical details, let me explain the problem. I have several scenarios where I need to include external markdown content:
The naive approach would be to use an HTTP client to fetch markdown whenever you need it. But that's problematic:
I needed something smarter: fetch once, cache intelligently, refresh automatically, and handle failures gracefully.
The extension follows a preprocessing approach rather than being part of the Markdig parsing pipeline. This is crucial because it means fetched content flows through your entire Markdig pipeline, getting all your custom extensions, syntax highlighting, and styling.
graph TD
A[Markdown with fetch tags] --> B[MarkdownFetchPreprocessor]
B --> C{Check Cache}
C -->|Fresh| D[Return Cached Content]
C -->|Stale/Missing| E[Fetch from Remote URL]
E -->|Success| F[Update Cache]
E -->|Failure| G{Has Cached?}
G -->|Yes| H[Return Stale Cache]
G -->|No| I[Return Error Comment]
F --> J[Replace fetch with Content]
D --> J
H --> J
I --> J
J --> K[Processed Markdown]
K --> L[Your Markdig Pipeline]
L --> M[Final HTML]
The key insight here is preprocessing. Before your markdown hits the Markdig pipeline, we:
<fetch> tagsThis ensures consistency - all markdown gets the same treatment regardless of its source.
Using the extension is simple. In your markdown:
# My Documentation
<!-- Failed to fetch content from https://raw.githubusercontent.com/user/repo/main/README.md: HTTP 404: Not Found -->
That's it! The extension will:
One of the design principles I followed was flexibility. Different applications have different needs. A small demo app doesn't need PostgreSQL, but a multi-server production deployment does. So I built a pluggable storage architecture:
graph LR
A[IMarkdownFetchService Interface] --> B[InMemoryMarkdownFetchService]
A --> C[FileBasedMarkdownFetchService]
A --> D[PostgresMarkdownFetchService]
A --> E[SqliteMarkdownFetchService]
A --> F[SqlServerMarkdownFetchService]
A --> G[YourCustomService]
B --> H[ConcurrentDictionary]
C --> I[File System + SemaphoreSlim]
D --> J[PostgreSQL Database]
E --> K[SQLite Database]
F --> L[SQL Server Database]
G --> M[Your Storage Backend]
Everything implements IMarkdownFetchService:
public interface IMarkdownFetchService
{
Task<MarkdownFetchResult> FetchMarkdownAsync(
string url,
int pollFrequencyHours,
int blogPostId = 0);
Task<bool> RemoveCachedMarkdownAsync(
string url,
int blogPostId = 0);
}
Simple and clean. Each implementation handles storage its own way, but the interface remains consistent.
The simplest implementation uses ConcurrentDictionary:
public class InMemoryMarkdownFetchService : IMarkdownFetchService
{
private readonly ConcurrentDictionary<string, CacheEntry> _cache = new();
private readonly IHttpClientFactory _httpClientFactory;
private readonly ILogger<InMemoryMarkdownFetchService> _logger;
public async Task<MarkdownFetchResult> FetchMarkdownAsync(
string url,
int pollFrequencyHours,
int blogPostId)
{
var cacheKey = GetCacheKey(url, blogPostId);
// Check cache
if (_cache.TryGetValue(cacheKey, out var cached))
{
var age = DateTimeOffset.UtcNow - cached.FetchedAt;
if (age.TotalHours < pollFrequencyHours)
{
_logger.LogDebug("Returning cached content for {Url}", url);
return new MarkdownFetchResult
{
Success = true,
Content = cached.Content
};
}
}
// Fetch fresh content
var fetchResult = await FetchFromUrlAsync(url);
if (fetchResult.Success)
{
_cache[cacheKey] = new CacheEntry
{
Content = fetchResult.Content,
FetchedAt = DateTimeOffset.UtcNow
};
}
else if (cached != null)
{
// Fetch failed, return stale cache
_logger.LogWarning("Fetch failed, returning stale cache for {Url}", url);
return new MarkdownFetchResult
{
Success = true,
Content = cached.Content
};
}
return fetchResult;
}
private static string GetCacheKey(string url, int blogPostId)
=> $"{url}_{blogPostId}";
}
As you can see this does the following:
This pattern - stale-while-revalidate - is crucial for reliability. Even if GitHub is down, your site keeps working with cached content.
For single-server deployments, file-based storage works great:
public class FileBasedMarkdownFetchService : IMarkdownFetchService
{
private readonly string _cacheDirectory;
private readonly IHttpClientFactory _httpClientFactory;
private readonly ILogger<FileBasedMarkdownFetchService> _logger;
private readonly SemaphoreSlim _fileLock = new(1, 1);
public async Task<MarkdownFetchResult> FetchMarkdownAsync(
string url,
int pollFrequencyHours,
int blogPostId)
{
var cacheKey = ComputeCacheKey(url, blogPostId);
var cacheFile = GetCacheFilePath(cacheKey);
await _fileLock.WaitAsync();
try
{
// Check if file exists and is fresh
if (File.Exists(cacheFile))
{
var fileInfo = new FileInfo(cacheFile);
var age = DateTimeOffset.UtcNow - fileInfo.LastWriteTimeUtc;
if (age.TotalHours < pollFrequencyHours)
{
var cached = await File.ReadAllTextAsync(cacheFile);
return new MarkdownFetchResult
{
Success = true,
Content = cached
};
}
}
// Fetch fresh
var fetchResult = await FetchFromUrlAsync(url);
if (fetchResult.Success)
{
await File.WriteAllTextAsync(cacheFile, fetchResult.Content);
}
else if (File.Exists(cacheFile))
{
// Return stale on fetch failure
var stale = await File.ReadAllTextAsync(cacheFile);
return new MarkdownFetchResult
{
Success = true,
Content = stale
};
}
return fetchResult;
}
finally
{
_fileLock.Release();
}
}
private string GetCacheFilePath(string cacheKey)
=> Path.Combine(_cacheDirectory, $"{cacheKey}.md");
private static string ComputeCacheKey(string url, int blogPostId)
{
var combined = $"{url}_{blogPostId}";
using var sha256 = SHA256.Create();
var bytes = Encoding.UTF8.GetBytes(combined);
var hash = sha256.ComputeHash(bytes);
return Convert.ToHexString(hash);
}
}
Key points here:
SemaphoreSlim for thread-safe file accessFor production deployments, especially multi-server setups, you want a shared cache. That's where the database providers come in:
public class PostgresMarkdownFetchService : IMarkdownFetchService
{
private readonly MarkdownCacheDbContext _dbContext;
private readonly IHttpClientFactory _httpClientFactory;
private readonly ILogger<PostgresMarkdownFetchService> _logger;
public async Task<MarkdownFetchResult> FetchMarkdownAsync(
string url,
int pollFrequencyHours,
int blogPostId)
{
var cacheKey = GetCacheKey(url, blogPostId);
// Query cache
var cached = await _dbContext.MarkdownCache
.FirstOrDefaultAsync(c => c.CacheKey == cacheKey);
if (cached != null)
{
var age = DateTimeOffset.UtcNow - cached.LastFetchedAt;
if (age.TotalHours < pollFrequencyHours)
{
return new MarkdownFetchResult
{
Success = true,
Content = cached.Content
};
}
}
// Fetch fresh
var fetchResult = await FetchFromUrlAsync(url);
if (fetchResult.Success)
{
if (cached == null)
{
cached = new MarkdownCacheEntry
{
CacheKey = cacheKey,
Url = url,
BlogPostId = blogPostId
};
_dbContext.MarkdownCache.Add(cached);
}
cached.Content = fetchResult.Content;
cached.LastFetchedAt = DateTimeOffset.UtcNow;
await _dbContext.SaveChangesAsync();
}
else if (cached != null)
{
// Return stale
return new MarkdownFetchResult
{
Success = true,
Content = cached.Content
};
}
return fetchResult;
}
}
The database schema is simple:
CREATE TABLE markdown_cache (
id SERIAL PRIMARY KEY,
cache_key VARCHAR(128) NOT NULL UNIQUE,
url VARCHAR(2048) NOT NULL,
blog_post_id INTEGER NOT NULL,
content TEXT NOT NULL,
last_fetched_at TIMESTAMP WITH TIME ZONE NOT NULL,
CONSTRAINT ix_markdown_cache_cache_key UNIQUE (cache_key)
);
CREATE INDEX ix_markdown_cache_url_blog_post_id
ON markdown_cache(url, blog_post_id);
In a multi-server deployment, this gives you cache consistency across all instances:
graph TB
subgraph "Load Balancer"
LB[Load Balancer]
end
subgraph "Application Servers"
A1[App Server 1<br/>FetchExtension]
A2[App Server 2<br/>FetchExtension]
A3[App Server 3<br/>FetchExtension]
end
subgraph "Shared Cache"
PG[(PostgreSQL<br/>markdown_cache table)]
end
subgraph "External Content"
R1[Remote URL 1]
R2[Remote URL 2]
R3[Remote URL 3]
end
LB --> A1
LB --> A2
LB --> A3
A1 <-->|Read/Write Cache| PG
A2 <-->|Read/Write Cache| PG
A3 <-->|Read/Write Cache| PG
A1 -.->|Fetch if cache miss| R1
A2 -.->|Fetch if cache miss| R2
A3 -.->|Fetch if cache miss| R3
All servers share the same cache. When Server 1 fetches a README, Servers 2 and 3 immediately benefit from that cached content.
Getting started is straightforward. First, install the base package:
dotnet add package mostlylucid.Markdig.FetchExtension
Then choose your storage provider:
# For in-memory (demos/testing)
# Already included in base package
# For file-based storage
# Already included in base package
# For PostgreSQL
dotnet add package mostlylucid.Markdig.FetchExtension.Postgres
# For SQLite
dotnet add package mostlylucid.Markdig.FetchExtension.Sqlite
# For SQL Server
dotnet add package mostlylucid.Markdig.FetchExtension.SqlServer
In your Program.cs:
using Mostlylucid.Markdig.FetchExtension;
var builder = WebApplication.CreateBuilder(args);
// Option 1: In-Memory (simplest)
builder.Services.AddInMemoryMarkdownFetch();
// Option 2: File-Based (persists across restarts)
builder.Services.AddFileBasedMarkdownFetch("./markdown-cache");
// Option 3: PostgreSQL (multi-server)
builder.Services.AddPostgresMarkdownFetch(
builder.Configuration.GetConnectionString("MarkdownCache"));
// Option 4: SQLite (single server with DB)
builder.Services.AddSqliteMarkdownFetch("Data Source=markdown-cache.db");
// Option 5: SQL Server (enterprise)
builder.Services.AddSqlServerMarkdownFetch(
builder.Configuration.GetConnectionString("MarkdownCache"));
var app = builder.Build();
// If using database storage, ensure schema exists
if (app.Environment.IsDevelopment())
{
app.Services.EnsureMarkdownCacheDatabase();
}
// Configure the extension with your service provider
FetchMarkdownExtension.ConfigureServiceProvider(app.Services);
app.Run();
The key is the preprocessing step. Here's how I integrate it in my blog:
public class MarkdownRenderingService
{
private readonly IServiceProvider _serviceProvider;
private readonly MarkdownFetchPreprocessor _preprocessor;
private readonly MarkdownPipeline _pipeline;
public MarkdownRenderingService(IServiceProvider serviceProvider)
{
_serviceProvider = serviceProvider;
_preprocessor = new MarkdownFetchPreprocessor(serviceProvider);
_pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.UseSyntaxHighlighting()
.UseToc() // Add TOC support for [TOC] markers
.UseYourCustomExtensions()
.Build();
}
public string RenderMarkdown(string markdown)
{
// Step 1: Preprocess to handle fetch tags
var processed = _preprocessor.Preprocess(markdown);
// Step 2: Run through your normal Markdig pipeline
return Markdown.ToHtml(processed, _pipeline);
}
}
The flow is:
<fetch> tagsThe package now includes a separate Table of Contents (TOC) extension! While it's packaged alongside the fetch extension, it's completely independent and can be used on its own. You can automatically generate a clickable table of contents from your document's headings.
Basic Usage:
Simply add [TOC] anywhere in your markdown:
# My Document
[TOC]
# Introduction
Content here...
# Getting Started
More content...
## Installation
Details...
This generates a nested list of all headings with anchor links:
<nav class="ml_toc" aria-label="Table of Contents">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#getting-started">Getting Started</a>
<ul>
<li><a href="#installation">Installation</a></li>
</ul>
</li>
</ul>
</nav>
Custom CSS Classes:
You can specify a custom CSS class for styling:
[TOC cssclass="my-custom-toc"]
This renders with your custom class:
<nav class="my-custom-toc" aria-label="Table of Contents">
<!-- TOC content -->
</nav>
How It Works:
Auto-Detection: The TOC automatically detects the minimum heading level in your document and adjusts accordingly. If your document starts with H2, the TOC treats H2 as the top level.
ID Generation: Headings are automatically given IDs for anchor linking:
id="getting-started"id="api-reference"Nested Structure: The renderer builds a properly nested ul/li structure that reflects your document hierarchy.
Enabling TOC Support:
When configuring your Markdig pipeline, add the TOC extension:
var pipeline = new MarkdownPipelineBuilder()
.UseAdvancedExtensions()
.UseToc() // Add TOC support - position in pipeline doesn't matter!
.Use<YourOtherExtensions>()
.Build();
Pipeline Position: Unlike some Markdig extensions, the TOC extension doesn't care where you add it in the pipeline. The extension automatically:
So you can add .UseToc() anywhere - beginning, middle, or end of your pipeline configuration.
Important: The TOC extension is completely independent of the fetch extension. They're just packaged together for convenience. You can:
Note: The TOC marker works in both your main markdown files and in fetched remote content. If you fetch a README from GitHub that contains [TOC], it will automatically generate a table of contents from that document's headings (assuming you've added .UseToc() to your pipeline).
When fetching remote markdown (especially from GitHub), relative links break. The extension can automatically rewrite them:
<!-- Failed to fetch content from https://raw.githubusercontent.com/user/repo/main/docs/README.md: HTTP 404: Not Found -->
This transforms:
./CONTRIBUTING.md → https://github.com/user/repo/blob/main/docs/CONTRIBUTING.md../images/logo.png → https://github.com/user/repo/blob/main/images/logo.pngThe implementation uses the Markdig AST to rewrite links:
public class MarkdownLinkRewriter
{
public static string RewriteLinks(string markdown, string sourceUrl)
{
var document = Markdown.Parse(markdown);
var baseUri = GetBaseUri(sourceUrl);
foreach (var link in document.Descendants<LinkInline>())
{
if (IsRelativeLink(link.Url))
{
link.Url = ResolveRelativeLink(baseUri, link.Url);
}
}
using var writer = new StringWriter();
var renderer = new NormalizeRenderer(writer);
renderer.Render(document);
return writer.ToString();
}
private static bool IsRelativeLink(string url)
{
if (string.IsNullOrEmpty(url)) return false;
if (url.StartsWith("http://") || url.StartsWith("https://")) return false;
if (url.StartsWith("#")) return false; // Anchor
if (url.StartsWith("mailto:")) return false;
return true;
}
}
You can show readers when content was last fetched:
<!-- Failed to fetch content from https://api.example.com/status.md: HTTP request failed: Name or service not known (api.example.com:443) -->
This renders with a footer:
Content fetched from https://api.example.com/status.md on 06 Jan 2025 (2 hours ago)
Or customize the template:
<!-- Failed to fetch content from https://example.com/docs.md: HTTP 404: Not Found -->
Output:~~~~
Last updated: 06 January 2025 14:30 | Status: cached | Next refresh: in 22 hours
Available placeholders:
{retrieved:format} - Last fetch date/time{age} - Human-readable time since fetch{url} - Source URL{nextrefresh:format} - When content will be refreshed{pollfrequency} - Cache duration in hours{status} - Cache status (fresh/cached/stale)When writing documentation about the fetch extension (like this article!), you need a way to show the tags without them being processed. Use the disable="true" attribute:
<!-- This will be processed and fetch content -->
<!-- Failed to fetch content from https://example.com/README.md: HTTP 404: Not Found -->
<!-- This will NOT be processed - useful for documentation -->
<!-- Failed to fetch content from https://example.com/README.md: HTTP 404: Not Found -->
The disabled tag remains in the markdown as-is, perfect for:
This works for both <fetch> and <fetch-summary> tags:
<!-- No fetch data available for https://example.com/api/status.md -->
The extension publishes events for all fetch operations:
public class Startup
{
public void ConfigureServices(IServiceCollection services)
{
services.AddPostgresMarkdownFetch(connectionString);
var sp = services.BuildServiceProvider();
var eventPublisher = sp.GetRequiredService<IMarkdownFetchEventPublisher>();
// Subscribe to events
eventPublisher.FetchBeginning += (sender, args) =>
{
Console.WriteLine($"Fetching {args.Url}...");
};
eventPublisher.FetchCompleted += (sender, args) =>
{
var source = args.WasCached ? "cache" : "remote";
Console.WriteLine($"Fetched {args.Url} from {source} in {args.Duration.TotalMilliseconds}ms");
};
eventPublisher.FetchFailed += (sender, args) =>
{
Console.WriteLine($"Failed to fetch {args.Url}: {args.ErrorMessage}");
};
}
}
This makes it easy to integrate with Application Insights, Prometheus, or your logging infrastructure:
sequenceDiagram
participant MD as Markdown Processor
participant EP as Event Publisher
participant FS as Fetch Service
participant ST as Storage Backend
participant L as Your Listeners
MD->>EP: FetchBeginning
EP->>L: Notify FetchBeginning
EP->>FS: FetchMarkdownAsync(url)
FS->>ST: Check Cache
alt Cache Fresh
ST-->>FS: Cached Content
FS->>EP: FetchCompleted (cached=true)
else Cache Stale/Missing
FS->>FS: HTTP GET
alt Success
FS->>ST: Update Cache
ST-->>FS: OK
FS->>EP: FetchCompleted (cached=false)
else Failure
FS->>EP: FetchFailed
EP->>L: Notify FetchFailed
end
end
EP->>L: Notify FetchCompleted
EP->>L: Notify ContentUpdated
FS-->>MD: MarkdownFetchResult
Note over L: Listeners can be: Logging, Metrics, Telemetry, Webhooks
The caching behavior follows a state machine pattern:
stateDiagram-v2
[*] --> CheckCache: Fetch Request
CheckCache --> Fresh: Cache exists & age < pollFrequency
CheckCache --> Stale: Cache exists & age >= pollFrequency
CheckCache --> Missing: No cache entry
Fresh --> ReturnCached: Return cached content
ReturnCached --> [*]
Stale --> FetchRemote: Attempt HTTP GET
Missing --> FetchRemote: Attempt HTTP GET
FetchRemote --> UpdateCache: Success
FetchRemote --> HasStale: Failure
UpdateCache --> ReturnFresh: Return new content
ReturnFresh --> [*]
HasStale --> ReturnStale: Return stale cache
HasStale --> ReturnError: No cache available
ReturnStale --> [*]
ReturnError --> [*]
note right of Fresh
pollFrequency = 0
means always stale
end note
note right of HasStale
Stale-while-revalidate
pattern ensures uptime
end note
The key insights here:
This pattern is called stale-while-revalidate and it's excellent for reliability. Even if your source goes down, your site keeps serving cached content.
Sometimes you need to manually invalidate cache:
public class MarkdownController : Controller
{
private readonly IMarkdownFetchService _fetchService;
public async Task<IActionResult> InvalidateCache(string url)
{
var removed = await _fetchService.RemoveCachedMarkdownAsync(url);
if (removed)
{
return Ok(new { message = "Cache invalidated" });
}
return NotFound(new { message = "No cache entry found" });
}
}
Or via webhooks when content changes:
// GitHub webhook notifies of README update
app.MapPost("/webhooks/github", async (
GitHubWebhookPayload payload,
IMarkdownFetchService fetchService) =>
{
if (payload.Repository?.FullName == "user/repo" &&
payload.Commits?.Any(c => c.Modified?.Contains("README.md") == true) == true)
{
var url = "https://raw.githubusercontent.com/user/repo/main/README.md";
await fetchService.RemoveCachedMarkdownAsync(url);
return Results.Ok(new { message = "Cache invalidated" });
}
return Results.Ok(new { message = "No action needed" });
});
The extension includes comprehensive tests. Here's how I structure them:
public class MarkdownFetchServiceTests
{
[Fact]
public async Task FetchMarkdownAsync_CachesContent()
{
// Arrange
var services = new ServiceCollection();
services.AddLogging();
services.AddInMemoryMarkdownFetch();
var sp = services.BuildServiceProvider();
var fetchService = sp.GetRequiredService<IMarkdownFetchService>();
var url = "https://raw.githubusercontent.com/user/repo/main/README.md";
// Act - First fetch (from network)
var result1 = await fetchService.FetchMarkdownAsync(url, 24, 0);
// Act - Second fetch (from cache)
var result2 = await fetchService.FetchMarkdownAsync(url, 24, 0);
// Assert
Assert.True(result1.Success);
Assert.True(result2.Success);
Assert.Equal(result1.Content, result2.Content);
}
[Fact]
public async Task FetchMarkdownAsync_ReturnsStaleOnFailure()
{
// Arrange
var services = new ServiceCollection();
services.AddLogging();
services.AddInMemoryMarkdownFetch();
var sp = services.BuildServiceProvider();
var fetchService = sp.GetRequiredService<IMarkdownFetchService>();
var url = "https://httpstat.us/200?sleep=100";
// Act - First fetch succeeds
var result1 = await fetchService.FetchMarkdownAsync(url, 0, 0);
// Change URL to fail
var badUrl = "https://httpstat.us/500";
// Act - Second fetch fails, should return stale
var result2 = await fetchService.FetchMarkdownAsync(badUrl, 0, 0);
// Assert
Assert.True(result1.Success);
// Even though fetch failed, we return success with stale content
Assert.True(result2.Success);
}
}
The extension is designed for performance:
ConcurrentDictionary for thread-safe accessIHttpClientFactory for efficient connection reuseTypical performance numbers on my home server:
I publish all four packages to NuGet using GitHub Actions with OIDC authentication:
name: Publish Markdig.FetchExtension
on:
push:
tags:
- 'fetchextension-v*.*.*'
permissions:
id-token: write
contents: read
jobs:
build-and-publish:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: '9.0.x'
- name: Extract version from tag
id: get_version
run: |
TAG=${GITHUB_REF#refs/tags/fetchextension-v}
echo "VERSION=$TAG" >> $GITHUB_OUTPUT
- name: Build
run: dotnet build Mostlylucid.Markdig.FetchExtension/Mostlylucid.Markdig.FetchExtension.csproj --configuration Release -p:Version=${{ steps.get_version.outputs.VERSION }}
- name: Pack
run: dotnet pack Mostlylucid.Markdig.FetchExtension/Mostlylucid.Markdig.FetchExtension.csproj --configuration Release --no-build -p:PackageVersion=${{ steps.get_version.outputs.VERSION }} --output ./artifacts
- name: Login to NuGet (OIDC)
id: nuget_login
uses: NuGet/login@v1
with:
user: 'mostlylucid'
- name: Publish to NuGet
run: dotnet nuget push ./artifacts/*.nupkg --api-key ${{ steps.nuget_login.outputs.NUGET_API_KEY }} --source https://api.nuget.org/v3/index.json --skip-duplicate
The OIDC approach is more secure than storing API keys - GitHub generates short-lived tokens automatically.
Here's how I use it in my blog posts:
# My NuGet Package Documentation
Here's the official README from GitHub:
# Umami.Net
## UmamiClient
This is a .NET Core client for the Umami tracking API.
It's based on the Umami Node client, which can be found [here](https://github.com/umami-software/node).
You can see how to set up Umami as a docker
container [here](https://www.mostlylucid.net/blog/usingumamiforlocalanalytics).
You can read more detail about it's creation on my
blog [here](https://www.mostlylucid.net/blog/addingumamitrackingclientfollowup).
To use this client you need the following appsettings.json configuration:
```json
{
"Analytics":{
"UmamiPath" : "https://umamilocal.mostlylucid.net",
"WebsiteId" : "32c2aa31-b1ac-44c0-b8f3-ff1f50403bee"
},
}
Where UmamiPath is the path to your Umami instance and WebsiteId is the id of the website you want to track.
To use the client you need to add the following to your Program.cs:
using Umami.Net;
services.SetupUmamiClient(builder.Configuration);
This will add the Umami client to the services collection.
You can then use the client in two ways:
UmamiClient into your class and call the Track method: // Inject UmamiClient umamiClient
await umamiClient.Track("Search", new UmamiEventData(){{"query", encodedQuery}});
UmamiBackgroundSender to track events in the background (this uses an IHostedService to send events in
the background): // Inject UmamiBackgroundSender umamiBackgroundSender
await umamiBackgroundSender.Track("Search", new UmamiEventData(){{"query", encodedQuery}});
The client will send the event to the Umami API and it will be stored.
The UmamiEventData is a dictionary of key value pairs that will be sent to the Umami API as the event data.
There are additionally more low level methods that can be used to send events to the Umami API.
There's also a convenience method to track a page view. This will send an event to the Umami API with the url set (which counts as a pageview).
await umamiBackgroundSender.TrackPageView("api/search/" + encodedQuery, "searchEvent", eventData: new UmamiEventData(){{"query", encodedQuery}});
await umamiClient.TrackPageView("api/search/" + encodedQuery, "searchEvent", eventData: new UmamiEventData(){{"query", encodedQuery}});
Here we're setting the url to "api/search/" + encodedQuery and the event type to "searchEvent". We're also passing in a dictionary of key value pairs as the event data.
On both the UmamiClient and UmamiBackgroundSender you can call the following method.
Send(UmamiPayload? payload = null, UmamiEventData? eventData = null,
string eventType = "event")
If you don't pass in a UmamiPayload object, the client will create one for you using the WebsiteId from the
appsettings.json.
public UmamiPayload GetPayload(string? url = null, UmamiEventData? data = null)
{
var httpContext = httpContextAccessor.HttpContext;
var request = httpContext?.Request;
var payload = new UmamiPayload
{
Website = settings.WebsiteId,
Data = data,
Url = url ?? httpContext?.Request?.Path.Value,
IpAddress = httpContext?.Connection?.RemoteIpAddress?.ToString(),
UserAgent = request?.Headers["User-Agent"].FirstOrDefault(),
Referrer = request?.Headers["Referer"].FirstOrDefault(),
Hostname = request?.Host.Host,
};
return payload;
}
You can see that this populates the UmamiPayload object with the WebsiteId from the appsettings.json, the Url,
IpAddress, UserAgent, Referrer and Hostname from the HttpContext.
NOTE: eventType can only be "event" or "identify" as per the Umami API.
There's also a service that can be used to pull data from the Umami API. This is a service that allows me to pull data from my Umami instance to use in stuff like sorting posts by popularity etc...
To set it up you need to add a username and password for your umami instance to the Analytics element in your settings file:
"Analytics":{
"UmamiPath" : "https://umami.mostlylucid.net",
"WebsiteId" : "1e3b7657-9487-4857-a9e9-4e1920aa8c42",
"UserName": "admin",
"Password": ""
}
Then in your Program.cs you set up the UmamiDataService as follows:
services.SetupUmamiData(config);
You can then inject the UmamiDataService into your class and use it to pull data from the Umami API.
Now you have the UmamiDataService in your service collection you can start using it!
The methods are all from the Umami API definition you can read about them here: https://umami.is/docs/api/website-stats
All returns are wrapped in an UmamiResults<T> object which has a Success property and a Result property. The
Result property is the object returned from the Umami API.
public record UmamiResult<T>(HttpStatusCode Status, string Message, T? Data);
All requests apart from ActiveUsers have a base request object with two compulsory properties. I added convenience
DateTimes to the base request object to make it easier to set the start and end dates.
public class BaseRequest
{
[QueryStringParameter("startAt", isRequired: true)]
public long StartAt => StartAtDate.ToMilliseconds(); // Timestamp (in ms) of starting date
[QueryStringParameter("endAt", isRequired: true)]
public long EndAt => EndAtDate.ToMilliseconds(); // Timestamp (in ms) of end date
public DateTime StartAtDate { get; set; }
public DateTime EndAtDate { get; set; }
}
The service has the following methods:
This just gets the total number of CURRENT active users on the site
public async Task<UmamiResult<ActiveUsersResponse>> GetActiveUsers()
This returns a bunch of statistics about the site, including the number of users, page views, etc.
public async Task<UmamiResult<StatsResponseModels>> GetStats(StatsRequest statsRequest)
You may set a number of parameters to filter the data returned from the API. For instance using url will return the
stats for a specific URL.
public class StatsRequest : BaseRequest
{
[QueryStringParameter("url")]
public string? Url { get; set; } // Name of URL
[QueryStringParameter("referrer")]
public string? Referrer { get; set; } // Name of referrer
[QueryStringParameter("title")]
public string? Title { get; set; } // Name of page title
[QueryStringParameter("query")]
public string? Query { get; set; } // Name of query
[QueryStringParameter("event")]
public string? Event { get; set; } // Name of event
[QueryStringParameter("host")]
public string? Host { get; set; } // Name of hostname
[QueryStringParameter("os")]
public string? Os { get; set; } // Name of operating system
[QueryStringParameter("browser")]
public string? Browser { get; set; } // Name of browser
[QueryStringParameter("device")]
public string? Device { get; set; } // Name of device (e.g., Mobile)
[QueryStringParameter("country")]
public string? Country { get; set; } // Name of country
[QueryStringParameter("region")]
public string? Region { get; set; } // Name of region/state/province
[QueryStringParameter("city")]
public string? City { get; set; } // Name of city
}
The JSON object Umami returns is as follows.
{
"pageviews": { "value": 5, "change": 5 },
"visitors": { "value": 1, "change": 1 },
"visits": { "value": 3, "change": 2 },
"bounces": { "value": 0, "change": 0 },
"totaltime": { "value": 4, "change": 4 }
}
This is wrapped inside my StatsResponseModel object.
namespace Umami.Net.UmamiData.Models.ResponseObjects;
public class StatsResponseModels
{
public Pageviews pageviews { get; set; }
public Visitors visitors { get; set; }
public Visits visits { get; set; }
public Bounces bounces { get; set; }
public Totaltime totaltime { get; set; }
public class Pageviews
{
public int value { get; set; }
public int prev { get; set; }
}
public class Visitors
{
public int value { get; set; }
public int prev { get; set; }
}
public class Visits
{
public int value { get; set; }
public int prev { get; set; }
}
public class Bounces
{
public int value { get; set; }
public int prev { get; set; }
}
public class Totaltime
{
public int value { get; set; }
public int prev { get; set; }
}
}
Metrics in Umami provide you the number of views for specific types of properties.
One example of these is Events`:
'Events' in Umami are specific items you can track on a site. When tracking events using Umami.Net you can set a number
of properties which are tracked with the event name. For instance here I track Search requests with the URL and the
search term.
await umamiBackgroundSender.Track( "searchEvent", eventData: new UmamiEventData(){{"query", encodedQuery}});
To fetch data about this event you would use the Metrics method:
public async Task<UmamiResult<MetricsResponseModels[]>> GetMetrics(MetricsRequest metricsRequest)
As with the other methods this accepts the MetricsRequest object (with the compulsory BaseRequest properties) and a
number of optional properties to filter the data.
public class MetricsRequest : BaseRequest
{
[QueryStringParameter("type", isRequired: true)]
public MetricType Type { get; set; } // Metrics type
[QueryStringParameter("url")]
public string? Url { get; set; } // Name of URL
[QueryStringParameter("referrer")]
public string? Referrer { get; set; } // Name of referrer
[QueryStringParameter("title")]
public string? Title { get; set; } // Name of page title
[QueryStringParameter("query")]
public string? Query { get; set; } // Name of query
[QueryStringParameter("host")]
public string? Host { get; set; } // Name of hostname
[QueryStringParameter("os")]
public string? Os { get; set; } // Name of operating system
[QueryStringParameter("browser")]
public string? Browser { get; set; } // Name of browser
[QueryStringParameter("device")]
public string? Device { get; set; } // Name of device (e.g., Mobile)
[QueryStringParameter("country")]
public string? Country { get; set; } // Name of country
[QueryStringParameter("region")]
public string? Region { get; set; } // Name of region/state/province
[QueryStringParameter("city")]
public string? City { get; set; } // Name of city
[QueryStringParameter("language")]
public string? Language { get; set; } // Name of language
[QueryStringParameter("event")]
public string? Event { get; set; } // Name of event
[QueryStringParameter("limit")]
public int? Limit { get; set; } = 500; // Number of events returned (default: 500)
}
Here you can see that you can specify a number of properties in the request element to specify what metrics you want to return.
You can also set a Limit property to limit the number of results returned.
For instance to get the event over the past day I mentioned above you would use the following request:
var metricsRequest = new MetricsRequest
{
StartAtDate = DateTime.Now.AddDays(-1),
EndAtDate = DateTime.Now,
Type = MetricType.@event,
Event = "searchEvent"
};
The JSON object returned from the API is as follows:
[
{ "x": "searchEvent", "y": 46 }
]
And again I wrap this in my MetricsResponseModels object.
public class MetricsResponseModels
{
public string x { get; set; }
public int y { get; set; }
}
Where x is the event name and y is the number of times it has been triggered.
One of the most useful metrics is the number of page views. This is the number of times a page has been viewed on the
site. Below is the test I use to get the number of page views over the past 30 days. You'll note the Type parameter is
set as MetricType.url however this is also the default value so you don't need to set it.
[Fact]
public async Task Metrics_StartEnd()
{
var setup = new SetupUmamiData();
var serviceProvider = setup.Setup();
var websiteDataService = serviceProvider.GetRequiredService<UmamiDataService>();
var metrics = await websiteDataService.GetMetrics(new MetricsRequest()
{
StartAtDate = DateTime.Now.AddDays(-30),
EndAtDate = DateTime.Now,
Type = MetricType.url,
Limit = 500
});
Assert.NotNull(metrics);
Assert.Equal( HttpStatusCode.OK, metrics.Status);
}
This returns a MetricsResponse object which has the following JSON structure:
[
{
"x": "/",
"y": 1
},
{
"x": "/blog",
"y": 1
},
{
"x": "/blog/usingumamidataforwebsitestats",
"y": 1
}
]
Where x is the URL and y is the number of times it has been viewed.
This returns the number of page views for a specific URL.
Again here is a test I use for this method:
[Fact]
public async Task PageViews_StartEnd_Day_Url()
{
var setup = new SetupUmamiData();
var serviceProvider = setup.Setup();
var websiteDataService = serviceProvider.GetRequiredService<UmamiDataService>();
var pageViews = await websiteDataService.GetPageViews(new PageViewsRequest()
{
StartAtDate = DateTime.Now.AddDays(-7),
EndAtDate = DateTime.Now,
Unit = Unit.day,
Url = "/blog"
});
Assert.NotNull(pageViews);
Assert.Equal( HttpStatusCode.OK, pageViews.Status);
}
This returns a PageViewsResponse object which has the following JSON structure:
[
{
"date": "2024-09-06 00:00",
"value": 1
}
]
Where date is the date and value is the number of page views, this is repeated for each day in the range specified (
or hour, month, etc. depending on the Unit property).
As with the other methods this accepts the PageViewsRequest object (with the compulsory BaseRequest properties) and
a number of optional properties to filter the data.
public class PageViewsRequest : BaseRequest
{
// Required properties
[QueryStringParameter("unit", isRequired: true)]
public Unit Unit { get; set; } = Unit.day; // Time unit (year | month | hour | day)
[QueryStringParameter("timezone")]
[TimeZoneValidator]
public string Timezone { get; set; }
// Optional properties
[QueryStringParameter("url")]
public string? Url { get; set; } // Name of URL
[QueryStringParameter("referrer")]
public string? Referrer { get; set; } // Name of referrer
[QueryStringParameter("title")]
public string? Title { get; set; } // Name of page title
[QueryStringParameter("host")]
public string? Host { get; set; } // Name of hostname
[QueryStringParameter("os")]
public string? Os { get; set; } // Name of operating system
[QueryStringParameter("browser")]
public string? Browser { get; set; } // Name of browser
[QueryStringParameter("device")]
public string? Device { get; set; } // Name of device (e.g., Mobile)
[QueryStringParameter("country")]
public string? Country { get; set; } // Name of country
[QueryStringParameter("region")]
public string? Region { get; set; } // Name of region/state/province
[QueryStringParameter("city")]
public string? City { get; set; } // Name of city
}
As with the other methods you can set a number of properties to filter the data returned from the API, for instance you
could set the
Country property to get the number of page views from a specific country.
In this site I have some code which lets me use this service to get the number of views each blog page has. In the code
below I take a start and end date and a prefix (which is /blog in my case) and get the number of views for each page
in the blog.
I then cache this data for an hour so I don't have to keep hitting the Umami API.
public class UmamiDataSortService(
UmamiDataService dataService,
IMemoryCache cache)
{
public async Task<List<MetricsResponseModels>?> GetMetrics(DateTime startAt, DateTime endAt, string prefix="" )
{
using var activity = Log.Logger.StartActivity("GetMetricsWithPrefix");
try
{
var cacheKey = $"Metrics_{startAt}_{endAt}_{prefix}";
if (cache.TryGetValue(cacheKey, out List<MetricsResponseModels>? metrics))
{
activity?.AddProperty("CacheHit", true);
return metrics;
}
activity?.AddProperty("CacheHit", false);
var metricsRequest = new MetricsRequest()
{
StartAtDate = startAt,
EndAtDate = endAt,
Type = MetricType.url,
Limit = 500
};
var metricRequest = await dataService.GetMetrics(metricsRequest);
if(metricRequest.Status != HttpStatusCode.OK)
{
return null;
}
var filteredMetrics = metricRequest.Data.Where(x => x.x.StartsWith(prefix)).ToList();
cache.Set(cacheKey, filteredMetrics, TimeSpan.FromHours(1));
activity?.AddProperty("MetricsCount", filteredMetrics?.Count()?? 0);
activity?.Complete();
return filteredMetrics;
}
catch (Exception e)
{
activity?.Complete(LogEventLevel.Error, e);
return null;
}
}
The package is available on NuGet...
This keeps documentation synchronized automatically. When I update the README on GitHub, the blog post stays in sync (within 24 hours).
# Conclusion
Building this extension taught me several things:
1. **Preprocessing > Parsing** - Handling fetch tags before Markdig ensures consistency
2. **Stale-While-Revalidate** - This pattern is incredibly valuable for reliability
3. **Pluggable Storage** - Different deployments need different solutions
4. **Events for Observability** - Being able to monitor fetch operations is crucial
5. **Thread Safety Matters** - Concurrent access needs careful handling
6. **Bonus Features** - The package now includes a separate TOC extension for generating tables of contents
The extension is open source and available on NuGet:
- [mostlylucid.Markdig.FetchExtension](https://www.nuget.org/packages/mostlylucid.Markdig.FetchExtension)
- [mostlylucid.Markdig.FetchExtension.Postgres](https://www.nuget.org/packages/mostlylucid.Markdig.FetchExtension.Postgres)
- [mostlylucid.Markdig.FetchExtension.Sqlite](https://www.nuget.org/packages/mostlylucid.Markdig.FetchExtension.Sqlite)
- [mostlylucid.Markdig.FetchExtension.SqlServer](https://www.nuget.org/packages/mostlylucid.Markdig.FetchExtension.SqlServer)
Source code is on GitHub: [scottgal/mostlylucidweb](https://github.com/scottgal/mostlylucidweb/tree/main/Mostlylucid.Markdig.FetchExtension)
If you're building a documentation site, blog, or any application that needs to include external markdown content, give it a try. It's designed to be simple to use but powerful enough for production deployments.
© 2025 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.