LLMApi: Release Notes

Sunday, 02 November 2025

NOTE: This is the live release notes from gitHub.

Intorduction

Just the release notes from this nuget package. This is 'live' from GitHub so will update frequently.

Release Notes

v2.1.0 (2025-01-06) - Quality & Validation Release

Focus: Enhanced reliability, comprehensive testing, and improved developer experience

This release focuses on improving chunking reliability, providing comprehensive validation tooling, and streamlining configuration management. All features from v2.0 remain fully compatible.

Major Improvements

1. Comprehensive HTTP Validation Suite

70+ Test Cases: Complete validation coverage in LLMApi.http
- OpenAPI management (load, list, get, unload, describe)
- API context management (create, update, pause, resume, delete)
- gRPC proto management (upload, list, get, services, call)
- Continuous SSE streaming validation
- Chunking tests with large arrays
- Context history and custom descriptions
- Backend selection (X-LLM-Backend header)
- Schema validation (includeSchema + X-Response-Schema)
- Combined feature validation
- Error simulation (comprehensive HTTP error codes)

2. Enhanced Chunking Reliability

Explicit Array Formatting: Enhanced prompts with ultra-explicit instructions for JSON array generation
- Added critical formatting rules in PromptBuilder.cs: "Your FIRST character MUST be: ["
- Reinforced array formatting in ChunkingCoordinator.cs chunk context
- Prevents comma-separated object output (e.g., {...},{...} instead of [{...},{...}])
Improved Instruction Following: Better guidance for LLMs during multi-chunk requests
Known Limitations: Documented model/temperature recommendations for optimal chunking (see docs/OLLAMA_MODELS.md)

3. Configuration Streamlining

Clean appsettings.json: Removed verbose model comments, cleaner structure
docs/OLLAMA_MODELS.md: New comprehensive reference guide (285 lines)
- 10+ model configurations with hardware requirements
- Temperature guidelines for different use cases
- Context window sizing recommendations
- Multi-backend configuration examples
- Chunking troubleshooting guide
- Model installation instructions

4. Documentation Improvements

Full Swagger Documentation: All 25+ management endpoints now have complete Swagger docs
- Tags organization (OpenAPI Management, API Contexts, gRPC, etc.)
- Summaries and detailed descriptions
- Request/response examples
Backend API Reference: New docs/BACKEND_API_REFERENCE.md (600+ lines)
- Complete endpoint documentation
- Query parameters and headers reference
- Error response formats
- SignalR hub documentation

Bug Fixes

Fixed URL encoding issue in HTTP test for OpenAPI endpoint descriptions (:, /, {, } characters)

Files Modified

Code Changes:

mostlylucid.mockllmapi/Services/PromptBuilder.cs: Enhanced array formatting instructions (lines 82-94)
mostlylucid.mockllmapi/Services/ChunkingCoordinator.cs: Added array formatting to chunk context (line 447)
LLMApi/appsettings.json: Streamlined configuration

New Documentation:

docs/OLLAMA_MODELS.md: Comprehensive model configuration guide
docs/BACKEND_API_REFERENCE.md: Complete management API reference

Updated Files:

LLMApi/LLMApi.http: Expanded from 448 to 847 lines (70+ new validation tests)

Known Limitations

Chunking at High Temperature:

LLMs may generate comma-separated objects instead of arrays at temperature 1.2+
Workarounds:
1. Lower temperature to 0.8-1.0 for chunked requests
2. Use llama3.2:3b or mistral-nemo (better instruction following)
3. Reduce item count to avoid chunking
4. Disable auto-chunking with ?autoChunk=false

See docs/OLLAMA_MODELS.md for detailed troubleshooting.

Compatibility

Fully backward compatible with v2.0.0
All existing features continue to work
No breaking changes

v2.0.0 (2025-01-06) - MAJOR RELEASE

NO BREAKING CHANGES - Despite the major version bump, all existing code continues to work!

This is a major milestone release that transforms LLMock API into a comprehensive, production-ready mocking platform. Version 2.0 adds realistic SSE streaming modes, multi-backend load balancing, comprehensive backend selection, and extensive documentation.

Major Features

1. Realistic SSE Streaming Modes (MAJOR)

Three distinct SSE streaming modes for testing different real-world API patterns:

LlmTokens Mode (Default - Backward Compatible)

Token-by-token streaming for AI chat interfaces
Format: {"chunk":"text","accumulated":"fulltext","done":false}
Use case: Testing chatbot UIs, LLM applications

CompleteObjects Mode (NEW)

Complete JSON objects as separate SSE events
Format: {"data":{object},"index":0,"total":10,"done":false}
Use case: Twitter/X API, stock tickers, real-time feeds, IoT sensors

ArrayItems Mode (NEW)

Array items with rich metadata
Format: {"item":{object},"index":0,"total":100,"arrayName":"users","hasMore":true,"done":false}
Use case: Paginated results, bulk exports, search results

Configuration:

{
  "MockLlmApi": {
    "SseMode": "CompleteObjects"  // LlmTokens | CompleteObjects | ArrayItems
  }
}

Per-Request Override:

GET /api/mock/stream/users?sseMode=CompleteObjects
GET /api/mock/stream/data?sseMode=ArrayItems
GET /api/mock/stream/chat?sseMode=LlmTokens

Client Example (CompleteObjects):

const eventSource = new EventSource('/api/mock/stream/users?sseMode=CompleteObjects');

eventSource.onmessage = (event) => {
    const response = JSON.parse(event.data);

    if (response.done) {
        console.log('Complete!');
        eventSource.close();
    } else {
        console.log(`User ${response.index + 1}/${response.total}:`, response.data);
        // response.data contains the complete user object
    }
};

2. Multi-Backend Load Balancing (MAJOR)

Distribute requests across multiple LLM backends for high throughput:

Configuration:

{
  "MockLlmApi": {
    "Backends": [
      {
        "Name": "ollama-llama3",
        "Provider": "ollama",
        "Weight": 3,
        "Enabled": true
      },
      {
        "Name": "ollama-mistral",
        "Provider": "ollama",
        "Weight": 2,
        "Enabled": true
      },
      {
        "Name": "lmstudio-default",
        "Provider": "lmstudio",
        "Weight": 1,
        "Enabled": true
      }
    ]
  }
}

SignalR Hub with Load Balancing:

{
  "HubContexts": [
    {
      "Name": "high-throughput-data",
      "BackendNames": ["ollama-llama3", "ollama-mistral", "lmstudio-default"]
    }
  ]
}

Features:

Weighted round-robin distribution
Per-request backend selection
Automatic failover to default if backends unavailable
Works with SignalR, SSE, REST, and GraphQL

3. Comprehensive Backend Selection

Per-Request Selection (Multiple Methods):

# Via query parameter
GET /api/mock/users?backend=openai-gpt4

# Via header
GET /api/mock/users
X-LLM-Backend: openai-gpt4

# SignalR hub context
{
  "HubContexts": [
    {
      "Name": "analytics",
      "BackendName": "openai-gpt4-turbo"
    }
  ]
}

Multiple Providers Simultaneously:

Ollama (local models)
OpenAI (cloud API)
LM Studio (local server)
Mistral-Nemo (128k context)

4. Mistral-Nemo 128k Context Support

Configuration for Mistral-Nemo with massive 128k context window:

{
  "Backends": [
    {
      "Name": "ollama-mistral-nemo",
      "Provider": "ollama",
      "ModelName": "mistral-nemo",
      "MaxTokens": 128000,
      "Enabled": true
    }
  ]
}

Use Cases:

Generating thousands of detailed records in a single request
Complex nested data structures with deep relationships
Large batch operations (use with MaxItems=10000+)
Comprehensive test datasets
GraphQL queries with extensive nested fields

SignalR Example:

{
  "HubContexts": [
    {
      "Name": "massive-dataset-128k",
      "Description": "Massive dataset generation with 128k context",
      "BackendName": "ollama-mistral-nemo"
    }
  ]
}

5. Swagger/OpenAPI UI

Interactive API documentation with Swagger UI:

Swagger UI: Available at /swagger
OpenAPI Specification: Auto-generated from endpoints
Interactive Testing: Try endpoints directly in browser
Comprehensive Descriptions: Full documentation for all endpoints
Navigation: Linked from demo page header

Enable in Program.cs:

builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

app.UseSwagger();
app.UseSwaggerUI();

Documentation Overhaul

New Documentation Files

docs/SSE_STREAMING_MODES.md (2,500+ lines) - Complete SSE guide
- Detailed explanation of all three modes
- Client code examples for each mode
- Use case decision matrix
- Comparison charts
- Migration guide
- Troubleshooting section
LLMApi/SSE_Streaming.http - 30+ HTTP examples for SSE modes
- Examples for all three SSE modes
- Combined feature examples (backend selection, context, errors)
- Practical use cases (stock tickers, bulk exports, logs)
- Client JavaScript code examples

Updated Documentation

docs/CONFIGURATION_REFERENCE.md - Added SSE modes section
docs/MULTIPLE_LLM_BACKENDS.md - Enhanced with load balancing
appsettings.Full.json - Added SSE mode examples and Mistral-Nemo

🧪 Testing Improvements

New Test Coverage:

22 SSE mode tests (all passing)
- Enum value validation
- Configuration parsing
- Query parameter override
- Event format validation for all modes
- Case-insensitive parsing

Total Test Suite:

218 tests (213 passing, 5 gRPC tests skipped)
Zero compilation errors or warnings
Full backward compatibility verified

🔧 Configuration Enhancements

New Configuration Options:

{
  "MockLlmApi": {
    // SSE Streaming Modes (NEW)
    "SseMode": "LlmTokens",  // LlmTokens | CompleteObjects | ArrayItems

    // Multiple LLM Backends with Load Balancing (v1.8.0)
    "Backends": [
      {
        "Name": "backend-name",
        "Provider": "ollama",  // ollama | openai | lmstudio
        "BaseUrl": "http://localhost:11434/v1/",
        "ModelName": "llama3",
        "ApiKey": null,
        "MaxTokens": 8192,
        "Enabled": true,
        "Weight": 3,  // For load balancing
        "Priority": 10
      }
    ],

    // Legacy Single Backend (Still Supported)
    "BaseUrl": "http://localhost:11434/v1/",
    "ModelName": "llama3",

    // Auto-Chunking (v1.8.0)
    "EnableAutoChunking": true,
    "MaxInputTokens": 4096,
    "MaxOutputTokens": 2048,
    "MaxItems": 1000,

    // Streaming Configuration
    "StreamingChunkDelayMinMs": 0,
    "StreamingChunkDelayMaxMs": 0,

    // Cache Configuration
    "CacheSlidingExpirationMinutes": 15,
    "CacheAbsoluteExpirationMinutes": 60,
    "MaxCachePerKey": 5
  }
}

🌍 Environment Variables

Comprehensive environment variable support with full documentation:

# SSE Mode
export MockLlmApi__SseMode="CompleteObjects"

# Backend Selection
export MockLlmApi__Backends__0__Name="ollama-llama3"
export MockLlmApi__Backends__0__Provider="ollama"
export MockLlmApi__Backends__0__BaseUrl="http://localhost:11434/v1/"
export MockLlmApi__Backends__0__ModelName="llama3"
export MockLlmApi__Backends__0__Enabled="true"
export MockLlmApi__Backends__0__Weight="3"

# SignalR Hub Contexts with Backend Selection
export MockLlmApi__HubContexts__0__Name="analytics"
export MockLlmApi__HubContexts__0__BackendName="openai-gpt4-turbo"

# Or load balancing
export MockLlmApi__HubContexts__1__BackendNames__0="ollama-llama3"
export MockLlmApi__HubContexts__1__BackendNames__1="ollama-mistral"

📦 New Files

Core Implementation:

mostlylucid.mockllmapi/Models/SseMode.cs - SSE mode enum
mostlylucid.mockllmapi/Services/Providers/ILlmProvider.cs - Provider interface
mostlylucid.mockllmapi/Services/Providers/OllamaProvider.cs - Ollama provider
mostlylucid.mockllmapi/Services/Providers/OpenAIProvider.cs - OpenAI provider
mostlylucid.mockllmapi/Services/Providers/LMStudioProvider.cs - LM Studio provider
mostlylucid.mockllmapi/Services/Providers/LlmProviderFactory.cs - Provider factory
mostlylucid.mockllmapi/Services/LlmBackendSelector.cs - Backend selection logic

Testing:

LLMApi.Tests/SseModeTests.cs - 22 SSE mode tests

Documentation:

docs/SSE_STREAMING_MODES.md - Complete SSE guide (2,500+ lines)
LLMApi/SSE_Streaming.http - 30+ SSE examples

📝 Updated Files

Core Services:

mostlylucid.mockllmapi/LLMockApiOptions.cs - Added SseMode property, LlmBackends array
mostlylucid.mockllmapi/RequestHandlers/StreamingRequestHandler.cs - Added SSE mode routing
mostlylucid.mockllmapi/Services/LlmClient.cs - Added backend selection overloads
mostlylucid.mockllmapi/Services/MockDataBackgroundService.cs - Added SignalR backend selection
mostlylucid.mockllmapi/Models/HubContextConfig.cs - Added BackendName and BackendNames

Documentation:

README.md - Updated to v2.0, comprehensive feature list
docs/CONFIGURATION_REFERENCE.md - Added SSE modes and backend selection
docs/MULTIPLE_LLM_BACKENDS.md - Enhanced with load balancing examples
appsettings.Full.json - Added comprehensive examples

Demo Application:

LLMApi/Program.cs - Added Swagger configuration
LLMApi/Pages/_Layout.cshtml - Added Swagger UI link

🎨 Use Case Examples

Stock Market Feed (CompleteObjects):

GET /api/mock/stream/stocks?sseMode=CompleteObjects&shape={"ticker":"AAPL","price":150.25,"change":2.5}

Bulk Customer Export (ArrayItems):

GET /api/mock/stream/export-customers?sseMode=ArrayItems&shape={"customers":[{"id":"string","name":"string"}]}

AI Chat Interface (LlmTokens):

GET /api/mock/stream/chat?sseMode=LlmTokens&shape={"message":"Hello!"}

High-Throughput IoT Sensors (Load Balanced):

{
  "HubContexts": [
    {
      "Name": "iot-sensors",
      "BackendNames": ["ollama-llama3", "ollama-mistral", "lmstudio-default"]
    }
  ]
}

Massive Dataset with 128k Context:

GET /api/mock/stream/bulk-data?sseMode=ArrayItems&backend=ollama-mistral-nemo

Migration from v1.x

No Code Changes Required!

Version 2.0 is 100% backward compatible with v1.x:

// v1.x code - still works exactly the same
builder.Services.AddLLMockApi(builder.Configuration);
app.MapLLMockApi("/api/mock", includeStreaming: true);

// SSE streaming defaults to LlmTokens mode (original behavior)
// Legacy single backend config (BaseUrl/ModelName) still works

Opt-In to New Features:

// Same setup, just update appsettings.json
{
  "MockLlmApi": {
    "SseMode": "CompleteObjects",  // Switch to realistic streaming
    "Backends": [...]  // Add multiple backends
  }
}

Breaking Changes

NONE!

Despite the major version bump to 2.0, there are zero breaking changes:

Default SSE mode is LlmTokens (original behavior)
Legacy BaseUrl/ModelName config still supported
All existing endpoints work unchanged
All existing tests pass
Full backward compatibility maintained

📈 Performance Improvements

Multi-Backend Load Balancing: Distribute load across multiple LLM instances
Weighted Round-Robin: Smart distribution based on backend capabilities
Backend-Specific Token Limits: Optimize for each model's capabilities
CompleteObjects Mode: Fewer SSE events = lower overhead than token-by-token

Why Version 2.0?

This release represents a fundamental transformation:

v1.x: Mock API with LLM-powered generation

Single LLM backend
One SSE streaming mode
Basic configuration

v2.0: Production-Ready Mock Platform

Multiple LLM backends with load balancing
Three realistic SSE streaming modes
Comprehensive backend selection
Production-scale configuration (128k contexts)
Enterprise-ready documentation
Interactive API documentation (Swagger)

Version 2.0 positions LLMock API as a comprehensive mocking platform capable of handling production-scale testing requirements across diverse use cases.

Complete Documentation

SSE Streaming Modes - Complete guide to SSE modes
Multiple LLM Backends - Backend configuration guide
Configuration Reference - All configuration options
HTTP Examples - Ready-to-run SSE examples
Chunking and Caching - Auto-chunking guide
API Contexts - Context management guide
gRPC Support - gRPC mocking guide
README - Quick start and overview

🙏 Thank You

Thank you to all users and contributors who have helped shape LLMock API into a comprehensive mocking platform. Your feedback and use cases have driven these improvements!

v1.8.0 (2025-01-06)

See previous release notes for v1.8.0 features (Multiple LLM Backend Support, Automatic Request Chunking, Enhanced Cache Configuration).

Previous Releases

See full release history below for v1.7.x, v1.6.x, v1.5.x, and earlier versions.