Back to "AI-Powered Alt Text Generation with mostlylucid.llmlltText"

This is a viewer only at the moment see the article on how this works.

To update the preview hit Ctrl-Alt-R (or ⌘-Alt-R on Mac) or Enter to refresh. The Save icon lets you save the markdown file to disk

This is a preview from the server running through my markdig pipeline

Accessibility AI ASP.NET Florence-2 Image Processing NuGet

AI-Powered Alt Text Generation with mostlylucid.llmlltText

Tuesday, 25 November 2025

Want to get nice, descriptive alt text for images on your sites or jsut extract text from them? mostlylucid.llmalttext uses Microsoft's Florence-2 vision language model to generate high-quality alt text automatically - running entirely locally on your machine, no API keys required.

Note: I need to update this doc now the nuget package is out. If you look here you'll fine a nifty demo site you can download and use. I'll update this with details in the coming days.

NuGet License: Unlicense

Introduction

Alt text matters. Screen readers depend on it, SEO rankings factor it in, and it's simply the right thing to do for accessibility. But writing good alt text for hundreds of images? That's where most of us fall short.

This package solves that problem using Microsoft's Florence-2 vision language model - running entirely locally on your machine, no API keys required.

Source code: github.com/scottgal/mostlylucid.nugetpackages

The Problem

Every <img> tag should have meaningful alt text. But in practice:

  • Manual writing is tedious - hundreds of images means hours of work
  • AI APIs cost money - OpenAI Vision, Claude, etc. add up quickly
  • Privacy concerns - you might not want to send images to external APIs
  • Inconsistent quality - different people write alt text differently

What if you could generate high-quality alt text automatically, running entirely on your own hardware?

How It Works

The package uses Microsoft's Florence-2 model via ONNX runtime. Here's the processing pipeline:

flowchart TB
    subgraph Input[Image Sources]
        A[File Path]
        B[URL]
        C[Stream]
        D[Byte Array]
    end

    subgraph Processing[Florence-2 Pipeline]
        E[Image Preprocessing]
        F[Vision Encoder]
        G[Language Decoder]
    end

    subgraph Output[Results]
        H[Alt Text]
        I[OCR Text]
        J[Content Type]
    end

    A --> E
    B --> E
    C --> E
    D --> E
    E --> F
    F --> G
    G --> H
    G --> I
    G --> J

    style A stroke:#10b981,stroke-width:2px
    style B stroke:#10b981,stroke-width:2px
    style C stroke:#10b981,stroke-width:2px
    style D stroke:#10b981,stroke-width:2px
    style F stroke:#6366f1,stroke-width:2px
    style G stroke:#6366f1,stroke-width:2px
    style H stroke:#ec4899,stroke-width:2px
    style I stroke:#ec4899,stroke-width:2px
    style J stroke:#ec4899,stroke-width:2px

Key features:

  • Local execution - no API calls, no costs, no privacy concerns
  • ~800MB model - downloads once, cached forever
  • Multiple task types - brief captions, detailed descriptions, OCR
  • Content classification - knows if it's a photo, chart, screenshot, etc.

Quick Start

Installation

dotnet add package Mostlylucid.LlmAltText

Register Services

// Program.cs
builder.Services.AddAltTextGeneration();

That's it. The first run downloads the Florence-2 model (~800MB), then you're ready to go.

Generate Alt Text

public class ImageController : ControllerBase
{
    private readonly IImageAnalysisService _imageAnalysis;

    public ImageController(IImageAnalysisService imageAnalysis)
    {
        _imageAnalysis = imageAnalysis;
    }

    [HttpPost("analyze")]
    public async Task<IActionResult> Analyze(IFormFile image)
    {
        using var stream = image.OpenReadStream();
        var altText = await _imageAnalysis.GenerateAltTextAsync(stream);

        return Ok(new { altText });
    }
}

Multiple Input Sources

The service accepts images from anywhere - files, URLs, streams, or byte arrays.

From File Path

var altText = await _imageAnalysis.GenerateAltTextFromFileAsync("/images/photo.jpg");

From URL

var altText = await _imageAnalysis.GenerateAltTextFromUrlAsync(
    "https://example.com/image.png");

From Stream

using var stream = file.OpenReadStream();
var altText = await _imageAnalysis.GenerateAltTextAsync(stream);

From Byte Array

var bytes = await httpClient.GetByteArrayAsync(imageUrl);
var altText = await _imageAnalysis.GenerateAltTextAsync(bytes);

Task Types: Controlling Detail Level

Florence-2 supports three caption modes. Choose based on your needs:

// Brief - "A dog sitting on grass"
var brief = await _imageAnalysis.GenerateAltTextAsync(stream, "CAPTION");

// Detailed - "A golden retriever sitting on green grass in a park"
stream.Position = 0;
var detailed = await _imageAnalysis.GenerateAltTextAsync(stream, "DETAILED_CAPTION");

// Most detailed (default) - Full accessibility description
stream.Position = 0;
var full = await _imageAnalysis.GenerateAltTextAsync(stream, "MORE_DETAILED_CAPTION");
// "A happy golden retriever with light fur sitting on lush green grass
//  in a sunny park, with trees visible in the background."

When to use each:

Task Type Best For
CAPTION Thumbnails, decorative images, quick tooltips
DETAILED_CAPTION Social media, basic accessibility
MORE_DETAILED_CAPTION Full accessibility, screen readers (recommended)

OCR Text Extraction

Florence-2 can also extract text from images - useful for screenshots, documents, and charts.

// Extract text only
var extractedText = await _imageAnalysis.ExtractTextAsync(stream);

// Get both alt text and extracted text
var (altText, ocrText) = await _imageAnalysis.AnalyzeImageAsync(stream);

Console.WriteLine($"Alt: {altText}");
Console.WriteLine($"OCR: {ocrText}");

Content Type Classification

Not all images are the same. A photograph needs descriptive alt text; a document needs its text content. The classification feature helps you handle each appropriately:

var result = await _imageAnalysis.AnalyzeWithClassificationAsync(stream);

Console.WriteLine($"Type: {result.ContentType}");        // e.g., "Photograph"
Console.WriteLine($"Confidence: {result.ContentTypeConfidence:P0}"); // e.g., "87%"
Console.WriteLine($"Has Text: {result.HasSignificantText}");

Handling Different Content Types

var result = await _imageAnalysis.AnalyzeWithClassificationAsync(stream);

switch (result.ContentType)
{
    case ImageContentType.Document:
        // Documents - prioritize extracted text
        return result.ExtractedText;

    case ImageContentType.Screenshot:
        // Screenshots - combine description with UI text
        return result.HasSignificantText
            ? $"{result.AltText}. Text visible: {result.ExtractedText}"
            : result.AltText;

    case ImageContentType.Chart:
        // Charts - describe the visualization plus data
        return $"{result.AltText}. Data: {result.ExtractedText}";

    case ImageContentType.Photograph:
    default:
        // Photos - just the description
        return result.AltText;
}

Content Type Reference

Type Description Example
Photograph Real-world photos People, landscapes, products
Document Text-heavy content PDFs, forms, articles
Screenshot Software captures UI, websites, apps
Chart Data visualizations Graphs, pie charts, tables
Illustration Drawn content Artwork, cartoons, icons
Diagram Technical drawings Flowcharts, UML, schematics
Unknown Unclassified Edge cases

The Auto Alt Text TagHelper

Here's where it gets interesting. The TagHelper automatically generates alt text for any <img> tag missing one - at render time.

Setup

// Program.cs
builder.Services.AddAltTextGeneration(options =>
{
    options.EnableTagHelper = true;
    options.EnableDatabase = true;  // Cache results
    options.DbProvider = AltTextDbProvider.Sqlite;
    options.SqliteDbPath = "./alttext.db";
});

var app = builder.Build();
await app.Services.MigrateAltTextDatabaseAsync();

Register the TagHelper in _ViewImports.cshtml:

@addTagHelper *, Mostlylucid.LlmAltText

How It Works

flowchart LR
    subgraph Razor[Razor View Rendering]
        A[img tag found]
        B{Has alt attribute?}
        C[Skip - use existing]
        D{In cache?}
        E[Return cached]
        F[Fetch image]
        G[Generate alt text]
        H[Cache result]
        I[Render with alt]
    end

    A --> B
    B -->|Yes| C
    B -->|No| D
    D -->|Yes| E
    D -->|No| F
    F --> G
    G --> H
    H --> I
    E --> I

    style A stroke:#10b981,stroke-width:2px
    style B stroke:#6366f1,stroke-width:2px
    style G stroke:#ec4899,stroke-width:2px
    style I stroke:#8b5cf6,stroke-width:2px

What Gets Processed

<!-- NO ALT - Will be processed -->
<img src="https://example.com/photo.jpg" />

<!-- HAS ALT - Skipped (respects your text) -->
<img src="https://example.com/photo.jpg" alt="My custom description" />

<!-- EMPTY ALT - Skipped (decorative image per a11y standards) -->
<img src="https://example.com/decorative.jpg" alt="" />

<!-- EXPLICIT SKIP - Skipped -->
<img src="https://example.com/photo.jpg" data-skip-alt="true" />

<!-- DATA URI - Skipped (can't fetch) -->
<img src="data:image/png;base64,..." />

<!-- RELATIVE PATH - Skipped (needs absolute URL) -->
<img src="/images/photo.jpg" />

Domain Restrictions

For security, you can restrict which domains the TagHelper will fetch from:

options.AllowedImageDomains = new List<string>
{
    "mycdn.example.com",
    "images.mysite.org",
    "cdn.githubusercontent.com"
};

Database Caching

Without caching, every page render would regenerate alt text. That's slow and wasteful. The database cache stores results keyed by image URL.

SQLite (Development)

builder.Services.AddAltTextGeneration(options =>
{
    options.EnableDatabase = true;
    options.DbProvider = AltTextDbProvider.Sqlite;
    options.SqliteDbPath = "./alttext.db";
    options.CacheDurationMinutes = 60;
});

PostgreSQL (Production)

builder.Services.AddAltTextGeneration(options =>
{
    options.EnableDatabase = true;
    options.DbProvider = AltTextDbProvider.PostgreSql;
    options.ConnectionString = Configuration.GetConnectionString("AltTextDb");
});

Configuration Reference

builder.Services.AddAltTextGeneration(options =>
{
    // Model location (~800MB downloaded here)
    options.ModelPath = "./models";

    // Default task type for alt text generation
    options.DefaultTaskType = "MORE_DETAILED_CAPTION";

    // Maximum word count for alt text
    options.MaxWords = 90;

    // Enable detailed logging
    options.EnableDiagnosticLogging = true;

    // TagHelper settings
    options.EnableTagHelper = true;
    options.EnableDatabase = true;
    options.AutoMigrateDatabase = true;

    // Database provider
    options.DbProvider = AltTextDbProvider.Sqlite;
    options.SqliteDbPath = "alttext.db";
    // or
    options.DbProvider = AltTextDbProvider.PostgreSql;
    options.ConnectionString = "Host=localhost;Database=alttext;...";

    // Security
    options.AllowedImageDomains = new List<string> { "cdn.example.com" };
    options.SkipSrcPrefixes = new List<string> { "data:", "blob:" };

    // Caching
    options.CacheDurationMinutes = 60;
});

Real-World Example: Batch Processing

Here's how I use it to process images when importing blog posts:

public class ImageProcessor
{
    private readonly IImageAnalysisService _imageAnalysis;
    private readonly ILogger<ImageProcessor> _logger;

    public ImageProcessor(
        IImageAnalysisService imageAnalysis,
        ILogger<ImageProcessor> logger)
    {
        _imageAnalysis = imageAnalysis;
        _logger = logger;
    }

    public async Task ProcessMarkdownImagesAsync(string markdownPath)
    {
        var imageDir = Path.Combine(Path.GetDirectoryName(markdownPath)!, "images");
        if (!Directory.Exists(imageDir)) return;

        var images = Directory.GetFiles(imageDir, "*.*")
            .Where(f => IsImageFile(f));

        foreach (var imagePath in images)
        {
            try
            {
                var result = await _imageAnalysis
                    .AnalyzeWithClassificationFromFileAsync(imagePath);

                _logger.LogInformation(
                    "Processed {File}: {Type} ({Confidence:P0})",
                    Path.GetFileName(imagePath),
                    result.ContentType,
                    result.ContentTypeConfidence);

                // Store alt text for later use
                await SaveAltTextAsync(imagePath, result.AltText);
            }
            catch (Exception ex)
            {
                _logger.LogWarning(ex, "Failed to process {File}", imagePath);
            }
        }
    }

    private static bool IsImageFile(string path)
    {
        var ext = Path.GetExtension(path).ToLowerInvariant();
        return ext is ".jpg" or ".jpeg" or ".png" or ".gif" or ".webp" or ".bmp";
    }
}

Performance Considerations

What to Expect

Metric Typical Value
First run Slower (~800MB model download)
Model load 1-3 seconds
Per-image processing 500-2000ms
Memory usage 2GB+ recommended
Disk space ~800MB for models

Tips for Production

// 1. Register as Singleton (model load is expensive)
builder.Services.AddAltTextGeneration(); // Already singleton internally

// 2. Check readiness before processing
if (!_imageAnalysis.IsReady)
{
    return StatusCode(503, "AI model still initializing");
}

// 3. Use cancellation tokens for timeouts
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
var altText = await _imageAnalysis.GenerateAltTextFromUrlAsync(url, cts.Token);

// 4. Process in batches, not parallel (memory constraints)
foreach (var image in images)
{
    await ProcessImageAsync(image); // Sequential is safer
}

OpenTelemetry Integration

The package includes built-in tracing:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing =>
    {
        tracing.AddSource("Mostlylucid.LlmAltText");
    });

Traced activities:

  • llmalttext.generate_alt_text
  • llmalttext.extract_text
  • llmalttext.analyze_image
  • llmalttext.classify_content_type

Health Checks

Add a health check to monitor model status:

public class AltTextHealthCheck : IHealthCheck
{
    private readonly IImageAnalysisService _service;

    public AltTextHealthCheck(IImageAnalysisService service)
        => _service = service;

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken cancellationToken = default)
    {
        return Task.FromResult(_service.IsReady
            ? HealthCheckResult.Healthy("Florence-2 model ready")
            : HealthCheckResult.Unhealthy("Model not initialized"));
    }
}

// Registration
builder.Services.AddHealthChecks()
    .AddCheck<AltTextHealthCheck>("alttext");

Troubleshooting

Model Download Fails

Error: Failed to download model files

Solutions:

  • Check internet connectivity
  • Verify firewall allows Hugging Face downloads
  • Ensure ~800MB disk space available
  • Check write permissions on ModelPath

Service Not Ready

_imageAnalysis.IsReady // Returns false

Solutions:

  • Wait for model initialization (1-3 seconds)
  • Check logs for initialization errors
  • Verify sufficient memory (2GB+)

Poor Quality Alt Text

Solutions:

  • Use MORE_DETAILED_CAPTION (default)
  • Ensure input images are clear
  • Check image isn't too small or blurry

TagHelper Not Working

Solutions:

  • Verify EnableTagHelper = true
  • Check @addTagHelper in _ViewImports.cshtml
  • Use absolute URLs (relative paths are skipped)
  • Check AllowedImageDomains configuration

Accessibility Best Practices

Generated alt text is a starting point. For best results:

  1. Review the output - AI isn't perfect, verify accuracy
  2. Keep it concise - 90-100 words maximum
  3. Be descriptive - include subjects, actions, context
  4. Avoid redundancy - don't start with "Image of..."
  5. Consider purpose - alt text should serve the image's role on the page
  6. Use empty alt for decorative - set alt="" for purely decorative images
  7. Include visible text - if image contains text, include it

Conclusion

Mostlylucid.LlmAltText brings AI-powered accessibility to your .NET applications without the cost or privacy concerns of external APIs. The TagHelper makes it particularly easy - just enable it and your <img> tags gain automatic alt text.

The package is Unlicense (public domain), so do whatever you want with it.

Resources

logo

© 2025 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.