Building a Remote Markdown Fetcher for Markdig

Friday, 07 November 2025

Introduction

One of the challenges I faced while building this blog was how to efficiently include external markdown content without manually copying and pasting it everywhere. I wanted to fetch README files from my GitHub repositories, include documentation from other projects, and keep everything automatically synchronized. The solution? A custom Markdig extension that fetches remote markdown at render time and caches it intelligently.

In this post, I'll walk you through how I built Mostlylucid.Markdig.FetchExtension - a complete solution for fetching and caching remote markdown content with support for multiple storage backends, automatic polling, and a stale-while-revalidate caching pattern.

NOTE: This is still prerelease but i wanted to get it out there. Have fun but it might not work yet.

This article is AI generated - using claude code which also helped me build the feature.

UPDATE: Added disable="true" parameter so we can now demo the tags properly without them being processed!

UPDATE (Nov 7, 2025): Added Table of Contents (TOC) generation feature! Use [TOC] in your markdown to automatically generate a clickable table of contents from document headings.

See the source for it here on the GitHub for this site.

Why Build This?

Before diving into the technical details, let me explain the problem. I have several scenarios where I need to include external markdown content:

Package READMEs: When I write about a NuGet package I've published, I want to include its README directly from GitHub
API Documentation: External API docs that change frequently need to stay in sync
Shared Content: Documentation that lives in one repository but needs to appear in multiple places
Performance: I don't want to fetch this content on every page load - that would be slow and wasteful

The naive approach would be to use an HTTP client to fetch markdown whenever you need it. But that's problematic:

Every request hits the remote server
Network latency impacts page load times
No offline support
No handling of transient failures

I needed something smarter: fetch once, cache intelligently, refresh automatically, and handle failures gracefully.

Architecture Overview

The extension follows a preprocessing approach rather than being part of the Markdig parsing pipeline. This is crucial because it means fetched content flows through your entire Markdig pipeline, getting all your custom extensions, syntax highlighting, and styling.

graph TD
    A[Markdown with fetch tags] --> B[MarkdownFetchPreprocessor]
    B --> C{Check Cache}
    C -->|Fresh| D[Return Cached Content]
    C -->|Stale/Missing| E[Fetch from Remote URL]
    E -->|Success| F[Update Cache]
    E -->|Failure| G{Has Cached?}
    G -->|Yes| H[Return Stale Cache]
    G -->|No| I[Return Error Comment]
    F --> J[Replace fetch with Content]
    D --> J
    H --> J
    I --> J
    J --> K[Processed Markdown]
    K --> L[Your Markdig Pipeline]
    L --> M[Final HTML]

The key insight here is preprocessing. Before your markdown hits the Markdig pipeline, we:

Scan for <fetch> tags
Resolve the content (from cache or remote)
Replace the tags with actual markdown
Then let Markdig process everything together

This ensures consistency - all markdown gets the same treatment regardless of its source.

The Basic Syntax

Using the extension is simple. In your markdown:

# My Documentation

<!-- Failed to fetch content from https://raw.githubusercontent.com/user/repo/main/README.md: HTTP 404: Not Found -->

That's it! The extension will:

Fetch the README from GitHub
Cache it for 24 hours
Return cached content on subsequent requests
Auto-refresh when the cache expires

Storage Provider Architecture

One of the design principles I followed was flexibility. Different applications have different needs. A small demo app doesn't need PostgreSQL, but a multi-server production deployment does. So I built a pluggable storage architecture:

graph LR
    A[IMarkdownFetchService Interface] --> B[InMemoryMarkdownFetchService]
    A --> C[FileBasedMarkdownFetchService]
    A --> D[PostgresMarkdownFetchService]
    A --> E[SqliteMarkdownFetchService]
    A --> F[SqlServerMarkdownFetchService]
    A --> G[YourCustomService]

    B --> H[ConcurrentDictionary]
    C --> I[File System + SemaphoreSlim]
    D --> J[PostgreSQL Database]
    E --> K[SQLite Database]
    F --> L[SQL Server Database]
    G --> M[Your Storage Backend]

The Core Interface

Everything implements IMarkdownFetchService:

public interface IMarkdownFetchService
{
    Task<MarkdownFetchResult> FetchMarkdownAsync(
        string url,
        int pollFrequencyHours,
        int blogPostId = 0);

    Task<bool> RemoveCachedMarkdownAsync(
        string url,
        int blogPostId = 0);
}

Simple and clean. Each implementation handles storage its own way, but the interface remains consistent.

In-Memory Storage: Perfect for Demos

The simplest implementation uses ConcurrentDictionary:

public class InMemoryMarkdownFetchService : IMarkdownFetchService
{
    private readonly ConcurrentDictionary<string, CacheEntry> _cache = new();
    private readonly IHttpClientFactory _httpClientFactory;
    private readonly ILogger<InMemoryMarkdownFetchService> _logger;

    public async Task<MarkdownFetchResult> FetchMarkdownAsync(
        string url,
        int pollFrequencyHours,
        int blogPostId)
    {
        var cacheKey = GetCacheKey(url, blogPostId);

        // Check cache
        if (_cache.TryGetValue(cacheKey, out var cached))
        {
            var age = DateTimeOffset.UtcNow - cached.FetchedAt;
            if (age.TotalHours < pollFrequencyHours)
            {
                _logger.LogDebug("Returning cached content for {Url}", url);
                return new MarkdownFetchResult
                {
                    Success = true,
                    Content = cached.Content
                };
            }
        }

        // Fetch fresh content
        var fetchResult = await FetchFromUrlAsync(url);

        if (fetchResult.Success)
        {
            _cache[cacheKey] = new CacheEntry
            {
                Content = fetchResult.Content,
                FetchedAt = DateTimeOffset.UtcNow
            };
        }
        else if (cached != null)
        {
            // Fetch failed, return stale cache
            _logger.LogWarning("Fetch failed, returning stale cache for {Url}", url);
            return new MarkdownFetchResult
            {
                Success = true,
                Content = cached.Content
            };
        }

        return fetchResult;
    }

    private static string GetCacheKey(string url, int blogPostId)
        => $"{url}_{blogPostId}";
}

As you can see this does the following:

Creates a cache key from the URL and blog post ID
Checks if we have cached content and if it's fresh
If cache is fresh, returns it immediately
If stale, tries to fetch fresh content
On success, updates the cache
On failure with cached content, returns stale cache (stale-while-revalidate!)
On failure without cache, returns error

This pattern - stale-while-revalidate - is crucial for reliability. Even if GitHub is down, your site keeps working with cached content.

File-Based Storage: Simple Persistence

For single-server deployments, file-based storage works great:

public class FileBasedMarkdownFetchService : IMarkdownFetchService
{
    private readonly string _cacheDirectory;
    private readonly IHttpClientFactory _httpClientFactory;
    private readonly ILogger<FileBasedMarkdownFetchService> _logger;
    private readonly SemaphoreSlim _fileLock = new(1, 1);

    public async Task<MarkdownFetchResult> FetchMarkdownAsync(
        string url,
        int pollFrequencyHours,
        int blogPostId)
    {
        var cacheKey = ComputeCacheKey(url, blogPostId);
        var cacheFile = GetCacheFilePath(cacheKey);

        await _fileLock.WaitAsync();
        try
        {
            // Check if file exists and is fresh
            if (File.Exists(cacheFile))
            {
                var fileInfo = new FileInfo(cacheFile);
                var age = DateTimeOffset.UtcNow - fileInfo.LastWriteTimeUtc;

                if (age.TotalHours < pollFrequencyHours)
                {
                    var cached = await File.ReadAllTextAsync(cacheFile);
                    return new MarkdownFetchResult
                    {
                        Success = true,
                        Content = cached
                    };
                }
            }

            // Fetch fresh
            var fetchResult = await FetchFromUrlAsync(url);

            if (fetchResult.Success)
            {
                await File.WriteAllTextAsync(cacheFile, fetchResult.Content);
            }
            else if (File.Exists(cacheFile))
            {
                // Return stale on fetch failure
                var stale = await File.ReadAllTextAsync(cacheFile);
                return new MarkdownFetchResult
                {
                    Success = true,
                    Content = stale
                };
            }

            return fetchResult;
        }
        finally
        {
            _fileLock.Release();
        }
    }

    private string GetCacheFilePath(string cacheKey)
        => Path.Combine(_cacheDirectory, $"{cacheKey}.md");

    private static string ComputeCacheKey(string url, int blogPostId)
    {
        var combined = $"{url}_{blogPostId}";
        using var sha256 = SHA256.Create();
        var bytes = Encoding.UTF8.GetBytes(combined);
        var hash = sha256.ComputeHash(bytes);
        return Convert.ToHexString(hash);
    }
}

Key points here:

Uses SemaphoreSlim for thread-safe file access
Hashes the URL + blog post ID to create safe filenames
Uses file modification time to determine freshness
Persists across application restarts
Same stale-while-revalidate pattern

Database Storage: Production-Ready

For production deployments, especially multi-server setups, you want a shared cache. That's where the database providers come in:

public class PostgresMarkdownFetchService : IMarkdownFetchService
{
    private readonly MarkdownCacheDbContext _dbContext;
    private readonly IHttpClientFactory _httpClientFactory;
    private readonly ILogger<PostgresMarkdownFetchService> _logger;

    public async Task<MarkdownFetchResult> FetchMarkdownAsync(
        string url,
        int pollFrequencyHours,
        int blogPostId)
    {
        var cacheKey = GetCacheKey(url, blogPostId);

        // Query cache
        var cached = await _dbContext.MarkdownCache
            .FirstOrDefaultAsync(c => c.CacheKey == cacheKey);

        if (cached != null)
        {
            var age = DateTimeOffset.UtcNow - cached.LastFetchedAt;
            if (age.TotalHours < pollFrequencyHours)
            {
                return new MarkdownFetchResult
                {
                    Success = true,
                    Content = cached.Content
                };
            }
        }

        // Fetch fresh
        var fetchResult = await FetchFromUrlAsync(url);

        if (fetchResult.Success)
        {
            if (cached == null)
            {
                cached = new MarkdownCacheEntry
                {
                    CacheKey = cacheKey,
                    Url = url,
                    BlogPostId = blogPostId
                };
                _dbContext.MarkdownCache.Add(cached);
            }

            cached.Content = fetchResult.Content;
            cached.LastFetchedAt = DateTimeOffset.UtcNow;
            await _dbContext.SaveChangesAsync();
        }
        else if (cached != null)
        {
            // Return stale
            return new MarkdownFetchResult
            {
                Success = true,
                Content = cached.Content
            };
        }

        return fetchResult;
    }
}

The database schema is simple:

CREATE TABLE markdown_cache (
    id SERIAL PRIMARY KEY,
    cache_key VARCHAR(128) NOT NULL UNIQUE,
    url VARCHAR(2048) NOT NULL,
    blog_post_id INTEGER NOT NULL,
    content TEXT NOT NULL,
    last_fetched_at TIMESTAMP WITH TIME ZONE NOT NULL,
    CONSTRAINT ix_markdown_cache_cache_key UNIQUE (cache_key)
);

CREATE INDEX ix_markdown_cache_url_blog_post_id
    ON markdown_cache(url, blog_post_id);

In a multi-server deployment, this gives you cache consistency across all instances:

graph TB
    subgraph "Load Balancer"
        LB[Load Balancer]
    end

    subgraph "Application Servers"
        A1[App Server 1<br/>FetchExtension]
        A2[App Server 2<br/>FetchExtension]
        A3[App Server 3<br/>FetchExtension]
    end

    subgraph "Shared Cache"
        PG[(PostgreSQL<br/>markdown_cache table)]
    end

    subgraph "External Content"
        R1[Remote URL 1]
        R2[Remote URL 2]
        R3[Remote URL 3]
    end

    LB --> A1
    LB --> A2
    LB --> A3

    A1 <-->|Read/Write Cache| PG
    A2 <-->|Read/Write Cache| PG
    A3 <-->|Read/Write Cache| PG

    A1 -.->|Fetch if cache miss| R1
    A2 -.->|Fetch if cache miss| R2
    A3 -.->|Fetch if cache miss| R3

All servers share the same cache. When Server 1 fetches a README, Servers 2 and 3 immediately benefit from that cached content.

Setting Up the Extension

Getting started is straightforward. First, install the base package:

dotnet add package mostlylucid.Markdig.FetchExtension

Then choose your storage provider:

# For in-memory (demos/testing)
# Already included in base package

# For file-based storage
# Already included in base package

# For PostgreSQL
dotnet add package mostlylucid.Markdig.FetchExtension.Postgres

# For SQLite
dotnet add package mostlylucid.Markdig.FetchExtension.Sqlite

# For SQL Server
dotnet add package mostlylucid.Markdig.FetchExtension.SqlServer

Configuration in ASP.NET Core

In your Program.cs:

using Mostlylucid.Markdig.FetchExtension;

var builder = WebApplication.CreateBuilder(args);

// Option 1: In-Memory (simplest)
builder.Services.AddInMemoryMarkdownFetch();

// Option 2: File-Based (persists across restarts)
builder.Services.AddFileBasedMarkdownFetch("./markdown-cache");

// Option 3: PostgreSQL (multi-server)
builder.Services.AddPostgresMarkdownFetch(
    builder.Configuration.GetConnectionString("MarkdownCache"));

// Option 4: SQLite (single server with DB)
builder.Services.AddSqliteMarkdownFetch("Data Source=markdown-cache.db");

// Option 5: SQL Server (enterprise)
builder.Services.AddSqlServerMarkdownFetch(
    builder.Configuration.GetConnectionString("MarkdownCache"));

var app = builder.Build();

// If using database storage, ensure schema exists
if (app.Environment.IsDevelopment())
{
    app.Services.EnsureMarkdownCacheDatabase();
}

// Configure the extension with your service provider
FetchMarkdownExtension.ConfigureServiceProvider(app.Services);

app.Run();

Integration with Your Markdown Rendering

The key is the preprocessing step. Here's how I integrate it in my blog:

public class MarkdownRenderingService
{
    private readonly IServiceProvider _serviceProvider;
    private readonly MarkdownFetchPreprocessor _preprocessor;
    private readonly MarkdownPipeline _pipeline;

    public MarkdownRenderingService(IServiceProvider serviceProvider)
    {
        _serviceProvider = serviceProvider;
        _preprocessor = new MarkdownFetchPreprocessor(serviceProvider);

        _pipeline = new MarkdownPipelineBuilder()
            .UseAdvancedExtensions()
            .UseSyntaxHighlighting()
            .UseToc()  // Add TOC support for [TOC] markers
            .UseYourCustomExtensions()
            .Build();
    }

    public string RenderMarkdown(string markdown)
    {
        // Step 1: Preprocess to handle fetch tags
        var processed = _preprocessor.Preprocess(markdown);

        // Step 2: Run through your normal Markdig pipeline
        return Markdown.ToHtml(processed, _pipeline);
    }
}

The flow is:

Your markdown contains <fetch> tags
Preprocessor resolves them to actual markdown
The combined markdown goes through Markdig
Everything gets your custom extensions, styling, etc.

Advanced Features

Table of Contents Generation

The package now includes a separate Table of Contents (TOC) extension! While it's packaged alongside the fetch extension, it's completely independent and can be used on its own. You can automatically generate a clickable table of contents from your document's headings.

Basic Usage:

Simply add [TOC] anywhere in your markdown:

# My Document

[TOC]

# Introduction
Content here...

# Getting Started
More content...

## Installation
Details...

This generates a nested list of all headings with anchor links:

<nav class="ml_toc" aria-label="Table of Contents">
  <ul>
    <li><a href="#introduction">Introduction</a></li>
    <li><a href="#getting-started">Getting Started</a>
      <ul>
        <li><a href="#installation">Installation</a></li>
      </ul>
    </li>
  </ul>
</nav>

Custom CSS Classes:

You can specify a custom CSS class for styling:

[TOC cssclass="my-custom-toc"]

This renders with your custom class:

<nav class="my-custom-toc" aria-label="Table of Contents">
  <!-- TOC content -->
</nav>

How It Works:

Auto-Detection: The TOC automatically detects the minimum heading level in your document and adjusts accordingly. If your document starts with H2, the TOC treats H2 as the top level.
ID Generation: Headings are automatically given IDs for anchor linking:
- "Getting Started" → id="getting-started"
- "API Reference" → id="api-reference"
Nested Structure: The renderer builds a properly nested ul/li structure that reflects your document hierarchy.

Enabling TOC Support:

When configuring your Markdig pipeline, add the TOC extension:

var pipeline = new MarkdownPipelineBuilder()
    .UseAdvancedExtensions()
    .UseToc()  // Add TOC support - position in pipeline doesn't matter!
    .Use<YourOtherExtensions>()
    .Build();

Pipeline Position: Unlike some Markdig extensions, the TOC extension doesn't care where you add it in the pipeline. The extension automatically:

Inserts its parser at the beginning of the parser list (position 0)
Renders after all parsing is complete, collecting headings from the entire document

So you can add .UseToc() anywhere - beginning, middle, or end of your pipeline configuration.

Important: The TOC extension is completely independent of the fetch extension. They're just packaged together for convenience. You can:

Use TOC without fetch
Use fetch without TOC
Use both together

Note: The TOC marker works in both your main markdown files and in fetched remote content. If you fetch a README from GitHub that contains [TOC], it will automatically generate a table of contents from that document's headings (assuming you've added .UseToc() to your pipeline).

Link Transformation

When fetching remote markdown (especially from GitHub), relative links break. The extension can automatically rewrite them:

<!-- Failed to fetch content from https://raw.githubusercontent.com/user/repo/main/docs/README.md: HTTP 404: Not Found -->

This transforms:

./CONTRIBUTING.md → https://github.com/user/repo/blob/main/docs/CONTRIBUTING.md
../images/logo.png → https://github.com/user/repo/blob/main/images/logo.png
Preserves absolute URLs and anchors

The implementation uses the Markdig AST to rewrite links:

public class MarkdownLinkRewriter
{
    public static string RewriteLinks(string markdown, string sourceUrl)
    {
        var document = Markdown.Parse(markdown);
        var baseUri = GetBaseUri(sourceUrl);

        foreach (var link in document.Descendants<LinkInline>())
        {
            if (IsRelativeLink(link.Url))
            {
                link.Url = ResolveRelativeLink(baseUri, link.Url);
            }
        }

        using var writer = new StringWriter();
        var renderer = new NormalizeRenderer(writer);
        renderer.Render(document);
        return writer.ToString();
    }

    private static bool IsRelativeLink(string url)
    {
        if (string.IsNullOrEmpty(url)) return false;
        if (url.StartsWith("http://") || url.StartsWith("https://")) return false;
        if (url.StartsWith("#")) return false;  // Anchor
        if (url.StartsWith("mailto:")) return false;
        return true;
    }
}

Fetch Metadata Summary

You can show readers when content was last fetched:

<!-- Failed to fetch content from https://api.example.com/status.md: HTTP request failed: Name or service not known (api.example.com:443) -->

This renders with a footer:

Content fetched from https://api.example.com/status.md on 06 Jan 2025 (2 hours ago)

Or customize the template:

<!-- Failed to fetch content from https://example.com/docs.md: HTTP 404: Not Found -->

Output:~~~~

Last updated: 06 January 2025 14:30 | Status: cached | Next refresh: in 22 hours

Available placeholders:

{retrieved:format} - Last fetch date/time
{age} - Human-readable time since fetch
{url} - Source URL
{nextrefresh:format} - When content will be refreshed
{pollfrequency} - Cache duration in hours
{status} - Cache status (fresh/cached/stale)

Disabling Processing for Documentation

When writing documentation about the fetch extension (like this article!), you need a way to show the tags without them being processed. Use the disable="true" attribute:

<!-- This will be processed and fetch content -->
<!-- Failed to fetch content from https://example.com/README.md: HTTP 404: Not Found -->

<!-- This will NOT be processed - useful for documentation -->
<!-- Failed to fetch content from https://example.com/README.md: HTTP 404: Not Found -->

The disabled tag remains in the markdown as-is, perfect for:

Writing documentation about the extension itself
Creating examples in tutorials
Showing tag syntax without triggering fetches

This works for both <fetch> and <fetch-summary> tags:

<!-- No fetch data available for https://example.com/api/status.md -->

Event System for Monitoring

The extension publishes events for all fetch operations:

public class Startup
{
    public void ConfigureServices(IServiceCollection services)
    {
        services.AddPostgresMarkdownFetch(connectionString);

        var sp = services.BuildServiceProvider();
        var eventPublisher = sp.GetRequiredService<IMarkdownFetchEventPublisher>();

        // Subscribe to events
        eventPublisher.FetchBeginning += (sender, args) =>
        {
            Console.WriteLine($"Fetching {args.Url}...");
        };

        eventPublisher.FetchCompleted += (sender, args) =>
        {
            var source = args.WasCached ? "cache" : "remote";
            Console.WriteLine($"Fetched {args.Url} from {source} in {args.Duration.TotalMilliseconds}ms");
        };

        eventPublisher.FetchFailed += (sender, args) =>
        {
            Console.WriteLine($"Failed to fetch {args.Url}: {args.ErrorMessage}");
        };
    }
}

This makes it easy to integrate with Application Insights, Prometheus, or your logging infrastructure:

sequenceDiagram
    participant MD as Markdown Processor
    participant EP as Event Publisher
    participant FS as Fetch Service
    participant ST as Storage Backend
    participant L as Your Listeners

    MD->>EP: FetchBeginning
    EP->>L: Notify FetchBeginning
    EP->>FS: FetchMarkdownAsync(url)
    FS->>ST: Check Cache
    alt Cache Fresh
        ST-->>FS: Cached Content
        FS->>EP: FetchCompleted (cached=true)
    else Cache Stale/Missing
        FS->>FS: HTTP GET
        alt Success
            FS->>ST: Update Cache
            ST-->>FS: OK
            FS->>EP: FetchCompleted (cached=false)
        else Failure
            FS->>EP: FetchFailed
            EP->>L: Notify FetchFailed
        end
    end
    EP->>L: Notify FetchCompleted
    EP->>L: Notify ContentUpdated
    FS-->>MD: MarkdownFetchResult

    Note over L: Listeners can be: Logging, Metrics, Telemetry, Webhooks

Caching Strategy in Detail

The caching behavior follows a state machine pattern:

stateDiagram-v2
    [*] --> CheckCache: Fetch Request

    CheckCache --> Fresh: Cache exists & age < pollFrequency
    CheckCache --> Stale: Cache exists & age >= pollFrequency
    CheckCache --> Missing: No cache entry

    Fresh --> ReturnCached: Return cached content
    ReturnCached --> [*]

    Stale --> FetchRemote: Attempt HTTP GET
    Missing --> FetchRemote: Attempt HTTP GET

    FetchRemote --> UpdateCache: Success
    FetchRemote --> HasStale: Failure

    UpdateCache --> ReturnFresh: Return new content
    ReturnFresh --> [*]

    HasStale --> ReturnStale: Return stale cache
    HasStale --> ReturnError: No cache available

    ReturnStale --> [*]
    ReturnError --> [*]

    note right of Fresh
        pollFrequency = 0
        means always stale
    end note

    note right of HasStale
        Stale-while-revalidate
        pattern ensures uptime
    end note

The key insights here:

Fresh Cache - Return immediately, no network hit
Stale Cache - Try to fetch fresh, but fall back to stale if fetch fails
Missing Cache - Must fetch or return error
Zero Poll Frequency - Always fetch fresh (useful for testing)

This pattern is called stale-while-revalidate and it's excellent for reliability. Even if your source goes down, your site keeps serving cached content.

Cache Removal and Management

Sometimes you need to manually invalidate cache:

public class MarkdownController : Controller
{
    private readonly IMarkdownFetchService _fetchService;

    public async Task<IActionResult> InvalidateCache(string url)
    {
        var removed = await _fetchService.RemoveCachedMarkdownAsync(url);

        if (removed)
        {
            return Ok(new { message = "Cache invalidated" });
        }

        return NotFound(new { message = "No cache entry found" });
    }
}

Or via webhooks when content changes:

// GitHub webhook notifies of README update
app.MapPost("/webhooks/github", async (
    GitHubWebhookPayload payload,
    IMarkdownFetchService fetchService) =>
{
    if (payload.Repository?.FullName == "user/repo" &&
        payload.Commits?.Any(c => c.Modified?.Contains("README.md") == true) == true)
    {
        var url = "https://raw.githubusercontent.com/user/repo/main/README.md";
        await fetchService.RemoveCachedMarkdownAsync(url);

        return Results.Ok(new { message = "Cache invalidated" });
    }

    return Results.Ok(new { message = "No action needed" });
});

Testing the Extension

The extension includes comprehensive tests. Here's how I structure them:

public class MarkdownFetchServiceTests
{
    [Fact]
    public async Task FetchMarkdownAsync_CachesContent()
    {
        // Arrange
        var services = new ServiceCollection();
        services.AddLogging();
        services.AddInMemoryMarkdownFetch();
        var sp = services.BuildServiceProvider();

        var fetchService = sp.GetRequiredService<IMarkdownFetchService>();
        var url = "https://raw.githubusercontent.com/user/repo/main/README.md";

        // Act - First fetch (from network)
        var result1 = await fetchService.FetchMarkdownAsync(url, 24, 0);

        // Act - Second fetch (from cache)
        var result2 = await fetchService.FetchMarkdownAsync(url, 24, 0);

        // Assert
        Assert.True(result1.Success);
        Assert.True(result2.Success);
        Assert.Equal(result1.Content, result2.Content);
    }

    [Fact]
    public async Task FetchMarkdownAsync_ReturnsStaleOnFailure()
    {
        // Arrange
        var services = new ServiceCollection();
        services.AddLogging();
        services.AddInMemoryMarkdownFetch();
        var sp = services.BuildServiceProvider();

        var fetchService = sp.GetRequiredService<IMarkdownFetchService>();
        var url = "https://httpstat.us/200?sleep=100";

        // Act - First fetch succeeds
        var result1 = await fetchService.FetchMarkdownAsync(url, 0, 0);

        // Change URL to fail
        var badUrl = "https://httpstat.us/500";

        // Act - Second fetch fails, should return stale
        var result2 = await fetchService.FetchMarkdownAsync(badUrl, 0, 0);

        // Assert
        Assert.True(result1.Success);
        // Even though fetch failed, we return success with stale content
        Assert.True(result2.Success);
    }
}

Performance Considerations

The extension is designed for performance:

Concurrent Dictionary - In-memory implementation uses ConcurrentDictionary for thread-safe access
SemaphoreSlim - File-based uses async locking to prevent race conditions
Database Indexes - All database providers have proper indexes on cache keys
HTTP Client Pooling - Uses IHttpClientFactory for efficient connection reuse
Async All The Way - No blocking calls, everything is async

Typical performance numbers on my home server:

Cache hit: < 1ms
Cache miss with network: 200-500ms (depends on remote server)
Database lookup: 2-5ms (PostgreSQL with proper indexes)
File system lookup: 1-3ms

Deployment with GitHub Actions

I publish all four packages to NuGet using GitHub Actions with OIDC authentication:

name: Publish Markdig.FetchExtension

on:
  push:
    tags:
      - 'fetchextension-v*.*.*'

permissions:
  id-token: write
  contents: read

jobs:
  build-and-publish:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v4

    - name: Setup .NET
      uses: actions/setup-dotnet@v4
      with:
        dotnet-version: '9.0.x'

    - name: Extract version from tag
      id: get_version
      run: |
        TAG=${GITHUB_REF#refs/tags/fetchextension-v}
        echo "VERSION=$TAG" >> $GITHUB_OUTPUT

    - name: Build
      run: dotnet build Mostlylucid.Markdig.FetchExtension/Mostlylucid.Markdig.FetchExtension.csproj --configuration Release -p:Version=${{ steps.get_version.outputs.VERSION }}

    - name: Pack
      run: dotnet pack Mostlylucid.Markdig.FetchExtension/Mostlylucid.Markdig.FetchExtension.csproj --configuration Release --no-build -p:PackageVersion=${{ steps.get_version.outputs.VERSION }} --output ./artifacts

    - name: Login to NuGet (OIDC)
      id: nuget_login
      uses: NuGet/login@v1
      with:
        user: 'mostlylucid'

    - name: Publish to NuGet
      run: dotnet nuget push ./artifacts/*.nupkg --api-key ${{ steps.nuget_login.outputs.NUGET_API_KEY }} --source https://api.nuget.org/v3/index.json --skip-duplicate

The OIDC approach is more secure than storing API keys - GitHub generates short-lived tokens automatically.

Real-World Usage

Here's how I use it in my blog posts:

# My NuGet Package Documentation

Here's the official README from GitHub:

# Umami.Net

## UmamiClient

This is a .NET Core client for the Umami tracking API.
It's based on the Umami Node client, which can be found [here](https://github.com/umami-software/node).

You can see how to set up Umami as a docker
container [here](https://www.mostlylucid.net/blog/usingumamiforlocalanalytics).
You can read more detail about it's creation on my
blog [here](https://www.mostlylucid.net/blog/addingumamitrackingclientfollowup).

To use this client you need the following appsettings.json configuration:

```json
{
  "Analytics":{
    "UmamiPath" : "https://umamilocal.mostlylucid.net",
    "WebsiteId" : "32c2aa31-b1ac-44c0-b8f3-ff1f50403bee"
  },
}

Where UmamiPath is the path to your Umami instance and WebsiteId is the id of the website you want to track.

To use the client you need to add the following to your Program.cs:

using Umami.Net;

services.SetupUmamiClient(builder.Configuration);

This will add the Umami client to the services collection.

You can then use the client in two ways:

Track

Inject the UmamiClient into your class and call the Track method:

 // Inject UmamiClient umamiClient
 await umamiClient.Track("Search", new UmamiEventData(){{"query", encodedQuery}});

Use the UmamiBackgroundSender to track events in the background (this uses an IHostedService to send events in the background):

 // Inject UmamiBackgroundSender umamiBackgroundSender
await umamiBackgroundSender.Track("Search", new UmamiEventData(){{"query", encodedQuery}});

The client will send the event to the Umami API and it will be stored.

The UmamiEventData is a dictionary of key value pairs that will be sent to the Umami API as the event data.

There are additionally more low level methods that can be used to send events to the Umami API.

Track PageView

There's also a convenience method to track a page view. This will send an event to the Umami API with the url set (which counts as a pageview).

  await  umamiBackgroundSender.TrackPageView("api/search/" + encodedQuery, "searchEvent", eventData: new UmamiEventData(){{"query", encodedQuery}});
  
   await umamiClient.TrackPageView("api/search/" + encodedQuery, "searchEvent", eventData: new UmamiEventData(){{"query", encodedQuery}});

Here we're setting the url to "api/search/" + encodedQuery and the event type to "searchEvent". We're also passing in a dictionary of key value pairs as the event data.

Raw 'Send' method

On both the UmamiClient and UmamiBackgroundSender you can call the following method.



 Send(UmamiPayload? payload = null, UmamiEventData? eventData = null,
        string eventType = "event")

If you don't pass in a UmamiPayload object, the client will create one for you using the WebsiteId from the appsettings.json.

    public  UmamiPayload GetPayload(string? url = null, UmamiEventData? data = null)
    {
        var httpContext = httpContextAccessor.HttpContext;
        var request = httpContext?.Request;

        var payload = new UmamiPayload
        {
            Website = settings.WebsiteId,
            Data = data,
            Url = url ?? httpContext?.Request?.Path.Value,
            IpAddress = httpContext?.Connection?.RemoteIpAddress?.ToString(),
            UserAgent = request?.Headers["User-Agent"].FirstOrDefault(),
            Referrer = request?.Headers["Referer"].FirstOrDefault(),
           Hostname = request?.Host.Host,
        };
        
        return payload;
    }

You can see that this populates the UmamiPayload object with the WebsiteId from the appsettings.json, the Url, IpAddress, UserAgent, Referrer and Hostname from the HttpContext.

NOTE: eventType can only be "event" or "identify" as per the Umami API.

UmamiData

There's also a service that can be used to pull data from the Umami API. This is a service that allows me to pull data from my Umami instance to use in stuff like sorting posts by popularity etc...

To set it up you need to add a username and password for your umami instance to the Analytics element in your settings file:

    "Analytics":{
        "UmamiPath" : "https://umami.mostlylucid.net",
        "WebsiteId" : "1e3b7657-9487-4857-a9e9-4e1920aa8c42",
        "UserName": "admin",
        "Password": ""
     
    }

Then in your Program.cs you set up the UmamiDataService as follows:

    services.SetupUmamiData(config);

You can then inject the UmamiDataService into your class and use it to pull data from the Umami API.

Usage

Now you have the UmamiDataService in your service collection you can start using it!

Methods

The methods are all from the Umami API definition you can read about them here: https://umami.is/docs/api/website-stats

All returns are wrapped in an UmamiResults<T> object which has a Success property and a Result property. The Result property is the object returned from the Umami API.

public record UmamiResult<T>(HttpStatusCode Status, string Message, T? Data);

All requests apart from ActiveUsers have a base request object with two compulsory properties. I added convenience DateTimes to the base request object to make it easier to set the start and end dates.

public class BaseRequest
{
    [QueryStringParameter("startAt", isRequired: true)]
    public long StartAt => StartAtDate.ToMilliseconds(); // Timestamp (in ms) of starting date
    [QueryStringParameter("endAt", isRequired: true)]
    public long EndAt => EndAtDate.ToMilliseconds(); // Timestamp (in ms) of end date
    public DateTime StartAtDate { get; set; }
    public DateTime EndAtDate { get; set; }
}

The service has the following methods:

Active Users

This just gets the total number of CURRENT active users on the site

public async Task<UmamiResult<ActiveUsersResponse>> GetActiveUsers()

Stats

This returns a bunch of statistics about the site, including the number of users, page views, etc.

public async Task<UmamiResult<StatsResponseModels>> GetStats(StatsRequest statsRequest)

You may set a number of parameters to filter the data returned from the API. For instance using url will return the stats for a specific URL.

StatsRequest object

public class StatsRequest : BaseRequest
{
    [QueryStringParameter("url")]
    public string? Url { get; set; } // Name of URL
    
    [QueryStringParameter("referrer")]
    public string? Referrer { get; set; } // Name of referrer
    
    [QueryStringParameter("title")]
    public string? Title { get; set; } // Name of page title
    
    [QueryStringParameter("query")]
    public string? Query { get; set; } // Name of query
    
    [QueryStringParameter("event")]
    public string? Event { get; set; } // Name of event
    
    [QueryStringParameter("host")]
    public string? Host { get; set; } // Name of hostname
    
    [QueryStringParameter("os")]
    public string? Os { get; set; } // Name of operating system
    
    [QueryStringParameter("browser")]
    public string? Browser { get; set; } // Name of browser
    
    [QueryStringParameter("device")]
    public string? Device { get; set; } // Name of device (e.g., Mobile)
    
    [QueryStringParameter("country")]
    public string? Country { get; set; } // Name of country
    
    [QueryStringParameter("region")]
    public string? Region { get; set; } // Name of region/state/province
    
    [QueryStringParameter("city")]
    public string? City { get; set; } // Name of city
}

The JSON object Umami returns is as follows.

{
  "pageviews": { "value": 5, "change": 5 },
  "visitors": { "value": 1, "change": 1 },
  "visits": { "value": 3, "change": 2 },
  "bounces": { "value": 0, "change": 0 },
  "totaltime": { "value": 4, "change": 4 }
}

This is wrapped inside my StatsResponseModel object.

namespace Umami.Net.UmamiData.Models.ResponseObjects;

public class StatsResponseModels
{
    public Pageviews pageviews { get; set; }
    public Visitors visitors { get; set; }
    public Visits visits { get; set; }
    public Bounces bounces { get; set; }
    public Totaltime totaltime { get; set; }


    public class Pageviews
    {
        public int value { get; set; }
        public int prev { get; set; }
    }

    public class Visitors
    {
        public int value { get; set; }
        public int prev { get; set; }
    }

    public class Visits
    {
        public int value { get; set; }
        public int prev { get; set; }
    }

    public class Bounces
    {
        public int value { get; set; }
        public int prev { get; set; }
    }

    public class Totaltime
    {
        public int value { get; set; }
        public int prev { get; set; }
    }
}

Metrics

Metrics in Umami provide you the number of views for specific types of properties.

Events

One example of these is Events`:

'Events' in Umami are specific items you can track on a site. When tracking events using Umami.Net you can set a number of properties which are tracked with the event name. For instance here I track Search requests with the URL and the search term.

       await  umamiBackgroundSender.Track( "searchEvent", eventData: new UmamiEventData(){{"query", encodedQuery}});

To fetch data about this event you would use the Metrics method:

public async Task<UmamiResult<MetricsResponseModels[]>> GetMetrics(MetricsRequest metricsRequest)

As with the other methods this accepts the MetricsRequest object (with the compulsory BaseRequest properties) and a number of optional properties to filter the data.

MetricsRequest object

public class MetricsRequest : BaseRequest
{
    [QueryStringParameter("type", isRequired: true)]
    public MetricType Type { get; set; } // Metrics type

    [QueryStringParameter("url")]
    public string? Url { get; set; } // Name of URL
    
    [QueryStringParameter("referrer")]
    public string? Referrer { get; set; } // Name of referrer
    
    [QueryStringParameter("title")]
    public string? Title { get; set; } // Name of page title
    
    [QueryStringParameter("query")]
    public string? Query { get; set; } // Name of query
    
    [QueryStringParameter("host")]
    public string? Host { get; set; } // Name of hostname
    
    [QueryStringParameter("os")]
    public string? Os { get; set; } // Name of operating system
    
    [QueryStringParameter("browser")]
    public string? Browser { get; set; } // Name of browser
    
    [QueryStringParameter("device")]
    public string? Device { get; set; } // Name of device (e.g., Mobile)
    
    [QueryStringParameter("country")]
    public string? Country { get; set; } // Name of country
    
    [QueryStringParameter("region")]
    public string? Region { get; set; } // Name of region/state/province
    
    [QueryStringParameter("city")]
    public string? City { get; set; } // Name of city
    
    [QueryStringParameter("language")]
    public string? Language { get; set; } // Name of language
    
    [QueryStringParameter("event")]
    public string? Event { get; set; } // Name of event
    
    [QueryStringParameter("limit")]
    public int? Limit { get; set; } = 500; // Number of events returned (default: 500)
}

Here you can see that you can specify a number of properties in the request element to specify what metrics you want to return.

You can also set a Limit property to limit the number of results returned.

For instance to get the event over the past day I mentioned above you would use the following request:

var metricsRequest = new MetricsRequest
{
    StartAtDate = DateTime.Now.AddDays(-1),
    EndAtDate = DateTime.Now,
    Type = MetricType.@event,
    Event = "searchEvent"
};

The JSON object returned from the API is as follows:

[
  { "x": "searchEvent", "y": 46 }
]

And again I wrap this in my MetricsResponseModels object.

public class MetricsResponseModels
{
    public string x { get; set; }
    public int y { get; set; }
}

Where x is the event name and y is the number of times it has been triggered.

Page Views

One of the most useful metrics is the number of page views. This is the number of times a page has been viewed on the site. Below is the test I use to get the number of page views over the past 30 days. You'll note the Type parameter is set as MetricType.url however this is also the default value so you don't need to set it.

  [Fact]
    public async Task Metrics_StartEnd()
    {
        var setup = new SetupUmamiData();
        var serviceProvider = setup.Setup();
        var websiteDataService = serviceProvider.GetRequiredService<UmamiDataService>();
        
        var metrics = await websiteDataService.GetMetrics(new MetricsRequest()
        {
            StartAtDate = DateTime.Now.AddDays(-30),
            EndAtDate = DateTime.Now,
            Type = MetricType.url,
            Limit = 500
        });
        Assert.NotNull(metrics);
        Assert.Equal( HttpStatusCode.OK, metrics.Status);

    }

This returns a MetricsResponse object which has the following JSON structure:

[
  {
    "x": "/",
    "y": 1
  },
  {
    "x": "/blog",
    "y": 1
  },
  {
    "x": "/blog/usingumamidataforwebsitestats",
    "y": 1
  }
]

Where x is the URL and y is the number of times it has been viewed.

PageViews

This returns the number of page views for a specific URL.

Again here is a test I use for this method:

    [Fact]
    public async Task PageViews_StartEnd_Day_Url()
    {
        var setup = new SetupUmamiData();
        var serviceProvider = setup.Setup();
        var websiteDataService = serviceProvider.GetRequiredService<UmamiDataService>();
    
        var pageViews = await websiteDataService.GetPageViews(new PageViewsRequest()
        {
            StartAtDate = DateTime.Now.AddDays(-7),
            EndAtDate = DateTime.Now,
            Unit = Unit.day,
            Url = "/blog"
        });
        Assert.NotNull(pageViews);
        Assert.Equal( HttpStatusCode.OK, pageViews.Status);

    }

This returns a PageViewsResponse object which has the following JSON structure:

[
  {
    "date": "2024-09-06 00:00",
    "value": 1
  }
]

Where date is the date and value is the number of page views, this is repeated for each day in the range specified ( or hour, month, etc. depending on the Unit property).

As with the other methods this accepts the PageViewsRequest object (with the compulsory BaseRequest properties) and a number of optional properties to filter the data.

PageViewsRequest object

public class PageViewsRequest : BaseRequest
{
    // Required properties

    [QueryStringParameter("unit", isRequired: true)]
    public Unit Unit { get; set; } = Unit.day; // Time unit (year | month | hour | day)
    
    [QueryStringParameter("timezone")]
    [TimeZoneValidator]
    public string Timezone { get; set; }

    // Optional properties
    [QueryStringParameter("url")]
    public string? Url { get; set; } // Name of URL
    [QueryStringParameter("referrer")]
    public string? Referrer { get; set; } // Name of referrer
    [QueryStringParameter("title")]
    public string? Title { get; set; } // Name of page title
    [QueryStringParameter("host")]
    public string? Host { get; set; } // Name of hostname
    [QueryStringParameter("os")]
    public string? Os { get; set; } // Name of operating system
    [QueryStringParameter("browser")]
    public string? Browser { get; set; } // Name of browser
    [QueryStringParameter("device")]
    public string? Device { get; set; } // Name of device (e.g., Mobile)
    [QueryStringParameter("country")]
    public string? Country { get; set; } // Name of country
    [QueryStringParameter("region")]
    public string? Region { get; set; } // Name of region/state/province
    [QueryStringParameter("city")]
    public string? City { get; set; } // Name of city
}

As with the other methods you can set a number of properties to filter the data returned from the API, for instance you could set the Country property to get the number of page views from a specific country.

Using the Service

In this site I have some code which lets me use this service to get the number of views each blog page has. In the code below I take a start and end date and a prefix (which is /blog in my case) and get the number of views for each page in the blog.

I then cache this data for an hour so I don't have to keep hitting the Umami API.

public class UmamiDataSortService(
    UmamiDataService dataService,
    IMemoryCache cache)
{
    public async Task<List<MetricsResponseModels>?> GetMetrics(DateTime startAt, DateTime endAt, string prefix="" )
    {
        using var activity = Log.Logger.StartActivity("GetMetricsWithPrefix");
        try
        {
            var cacheKey = $"Metrics_{startAt}_{endAt}_{prefix}";
            if (cache.TryGetValue(cacheKey, out List<MetricsResponseModels>? metrics))
            {
                activity?.AddProperty("CacheHit", true);
                return metrics;
            }
            activity?.AddProperty("CacheHit", false);
            var metricsRequest = new MetricsRequest()
            {
                StartAtDate = startAt,
                EndAtDate = endAt,
                Type = MetricType.url,
                Limit = 500
            };
            var metricRequest = await dataService.GetMetrics(metricsRequest);

            if(metricRequest.Status != HttpStatusCode.OK)
            {
                return null;
            }
            var filteredMetrics = metricRequest.Data.Where(x => x.x.StartsWith(prefix)).ToList();
            cache.Set(cacheKey, filteredMetrics, TimeSpan.FromHours(1));
            activity?.AddProperty("MetricsCount", filteredMetrics?.Count()?? 0);
            activity?.Complete();
            return filteredMetrics;
        }
        catch (Exception e)
        {
            activity?.Complete(LogEventLevel.Error, e);
         
            return null;
        }
    }

Installation

The package is available on NuGet...


This keeps documentation synchronized automatically. When I update the README on GitHub, the blog post stays in sync (within 24 hours).

# Conclusion

Building this extension taught me several things:

1. **Preprocessing > Parsing** - Handling fetch tags before Markdig ensures consistency
2. **Stale-While-Revalidate** - This pattern is incredibly valuable for reliability
3. **Pluggable Storage** - Different deployments need different solutions
4. **Events for Observability** - Being able to monitor fetch operations is crucial
5. **Thread Safety Matters** - Concurrent access needs careful handling
6. **Bonus Features** - The package now includes a separate TOC extension for generating tables of contents

The extension is open source and available on NuGet:

- [mostlylucid.Markdig.FetchExtension](https://www.nuget.org/packages/mostlylucid.Markdig.FetchExtension)
- [mostlylucid.Markdig.FetchExtension.Postgres](https://www.nuget.org/packages/mostlylucid.Markdig.FetchExtension.Postgres)
- [mostlylucid.Markdig.FetchExtension.Sqlite](https://www.nuget.org/packages/mostlylucid.Markdig.FetchExtension.Sqlite)
- [mostlylucid.Markdig.FetchExtension.SqlServer](https://www.nuget.org/packages/mostlylucid.Markdig.FetchExtension.SqlServer)

Source code is on GitHub: [scottgal/mostlylucidweb](https://github.com/scottgal/mostlylucidweb/tree/main/Mostlylucid.Markdig.FetchExtension)

If you're building a documentation site, blog, or any application that needs to include external markdown content, give it a try. It's designed to be simple to use but powerful enough for production deployments.