MCP Is a Transport, Not an Architecture

Tuesday, 20 January 2026

"How do I build an MCP server?" is the wrong question.

SDKs exist. An LLM can scaffold one in seconds. The real question is:

What role should an MCP server play inside a system that must not lie?

The scarce skill now is designing the system around MCP: authority, authorisation, auditability, and deterministic control. MCP handles transport : those concerns live elsewhere.

This article explains how I use MCP without agents, autonomy, or vibes, and why most MCP examples recreate the same problems we've seen with unconstrained tool calling.

This is Part 2 of "LLMs as Components". Part 1: Why LLMs Fail as Sensors covered the category error of using LLMs for perception. This article covers the category error of using MCP as architecture.

What MCP Actually Is (And Isn't)

MCP (Model Context Protocol) is a wire protocol. It moves:

Inputs and outputs
Schemas
Tool metadata
Capability discovery

It does not:

Decide when tools should run
Validate truth
Enforce trust
Manage side effects

flowchart LR
    subgraph MCP["MCP: What It Does"]
        Schema[Schema Discovery] --> Transport[Transport Layer]
        Transport --> Invoke[Tool Invocation]
        Invoke --> Response[Structured Response]
    end

    subgraph NotMCP["Not MCP's Job"]
        When[When to run?]
        Trust[Is this true?]
        Should[Should we act?]
        Safe[Is this safe?]
    end

    style MCP fill:none,stroke:#16a34a,stroke-width:2px
    style NotMCP fill:none,stroke:#dc2626,stroke-width:2px
    style Schema fill:none,stroke:#059669,stroke-width:2px
    style Transport fill:none,stroke:#059669,stroke-width:2px
    style Invoke fill:none,stroke:#059669,stroke-width:2px
    style Response fill:none,stroke:#059669,stroke-width:2px
    style When fill:none,stroke:#dc2626,stroke-width:2px
    style Trust fill:none,stroke:#dc2626,stroke-width:2px
    style Should fill:none,stroke:#dc2626,stroke-width:2px
    style Safe fill:none,stroke:#dc2626,stroke-width:2px

MCP is closer to OpenAPI than it is to an agent framework.

If you're using MCP as your reasoning layer, you've already made the same category error described in Part 1.

If MCP is transport, then the architecture lives above it: orchestration, policy, validation, persistence. The rest of this article is that layer.

The Core Principle: Propose vs Decide

This is the foundational rule from Reduced RAG and Constrained Fuzziness:

Probabilistic components propose
Deterministic systems decide

In my systems:

LLMs are never authorities
Tools are never autonomous
Every MCP response is a proposal, not a fact

"Deterministic" here doesn't mean simplistic. It means rule-governed, reproducible, and auditable.

flowchart TD
    subgraph Proposers["Proposers (Probabilistic)"]
        LLM[LLM Synthesis]
        MCP1[MCP Tool Response]
        MCP2[MCP Tool Response]
    end

    subgraph Constrainer["Constrainer (Deterministic)"]
        Validate[Validate Proposals]
        Compare[Compare Confidence]
        Policy[Apply Policy Rules]
        Decide[Accept / Reject / Escalate]
    end

    subgraph Persistence["Persistence (Facts)"]
        Facts[(Verified Facts<br/>With Provenance)]
    end

    LLM --> Validate
    MCP1 --> Validate
    MCP2 --> Validate
    Validate --> Compare
    Compare --> Policy
    Policy --> Decide
    Decide --> Facts

    style Proposers fill:none,stroke:#d97706,stroke-width:2px
    style Constrainer fill:none,stroke:#2563eb,stroke-width:2px
    style Persistence fill:none,stroke:#16a34a,stroke-width:2px
    style LLM fill:none,stroke:#d97706,stroke-width:2px
    style MCP1 fill:none,stroke:#d97706,stroke-width:2px
    style MCP2 fill:none,stroke:#d97706,stroke-width:2px
    style Validate fill:none,stroke:#2563eb,stroke-width:2px
    style Compare fill:none,stroke:#2563eb,stroke-width:2px
    style Policy fill:none,stroke:#2563eb,stroke-width:2px
    style Decide fill:none,stroke:#2563eb,stroke-width:2px
    style Facts fill:none,stroke:#16a34a,stroke-width:3px

The MCP server doesn't decide. The constrainer decides.

Why "Tools" Are the Wrong Mental Model

The default MCP framing is:

"Expose methods as tools"
"Let the model choose which tool to call"
"Ask permission before execution"

This fails for predictable reasons:

Tool descriptions become prompts: The LLM infers intent from wording, not from schema semantics
Tool selection drifts: The model optimises for plausibility, not correctness, and picks tools based on description similarity
Permission prompts are UX, not safety: A modal dialog doesn't prevent the wrong tool from being selected
No replayability: You can't reproduce a session because tool selection was probabilistic

Toolformer (2023) established the theoretical basis for LLM tool use: models can learn when to call tools by measuring actual outcomes. This was foundational work. But Toolformer operated at training time with curated tool sets - not runtime agent systems with arbitrary MCP endpoints. The lesson from Toolformer isn't "let models choose freely" - it's measure outcomes objectively. See my comparison of Voyager, Toolformer, and structured approaches for why structure beats brilliance.

Reframe: MCP endpoints publish signals, not actions.

Signals are typed, bounded, and attributable
Every signal has confidence, provenance, and evidence pointers
Natural language is a presentation layer, not the substrate

flowchart LR
    subgraph Wrong["❌ Tools Mental Model"]
        Desc[Tool Description<br/>'Gets weather for city'] --> LLM1[LLM Chooses]
        LLM1 --> Execute[Execute Action]
        Execute --> Trust1[Trust Result?]
    end

    subgraph Right["✓ Signals Mental Model"]
        Schema2[Typed Schema<br/>city: string, units: enum] --> Invoke2[Deterministic Invoke]
        Invoke2 --> Signal[Signal Response<br/>+ confidence + provenance]
        Signal --> Validate2[Constrainer Validates]
    end

    style Wrong fill:none,stroke:#dc2626,stroke-width:2px
    style Right fill:none,stroke:#16a34a,stroke-width:2px
    style Desc fill:none,stroke:#dc2626,stroke-width:2px
    style LLM1 fill:none,stroke:#dc2626,stroke-width:2px
    style Execute fill:none,stroke:#dc2626,stroke-width:2px
    style Trust1 fill:none,stroke:#dc2626,stroke-width:2px
    style Schema2 fill:none,stroke:#16a34a,stroke-width:2px
    style Invoke2 fill:none,stroke:#16a34a,stroke-width:2px
    style Signal fill:none,stroke:#16a34a,stroke-width:2px
    style Validate2 fill:none,stroke:#16a34a,stroke-width:2px

Signal Contracts Over Chat

Every MCP endpoint in my systems has:

A schema: Typed inputs and outputs, not free-form text
Confidence: How certain is this result? (0.0–1.0)
Provenance: Where did this come from? (frame ID, bounding box, waveform region)
Evidence pointers: What can be verified independently?

No free-text authority. No "best effort" answers.

Examples from production systems:

Signal	Confidence Source	Evidence Pointer
OCR text extraction	Florence-2 confidence score	Bounding box coordinates
Image classification	CLIP similarity score	Embedding + similarity score + reference image ID
Audio transcription	Whisper word-level confidence	Timestamp range
Speaker identification	Diarization cluster distance	Segment boundaries

The rule: If a result can't be validated, it isn't accepted.

This is the same principle as Design Rule #5: facts need provenance.

The Constrainer: The Part Everyone Skips

Most MCP tutorials show:

Define tools
Connect to LLM
Let it run

The missing component is the constrainer - the deterministic logic that:

Evaluates competing proposals
Enforces policy
Decides what persists and what gets rejected
Routes to escalation when confidence is low

flowchart TD
    subgraph Sources["Signal Sources"]
        Heuristics[Heuristics<br/>Text-likeliness: 0.3]
        LocalModel[Local Model<br/>Florence-2 OCR: 0.85]
        LLMCall[LLM Escalation<br/>GPT-4V: 0.92]
    end

    subgraph Constrainer["Constrainer Logic"]
        Receive[Receive All Signals]
        Check{Confidence<br/>≥ 0.7?}
        Cross[Cross-Validate<br/>Signals Agree?]
        Accept[Accept as Fact]
        Reject[Reject / Log]
        Escalate[Escalate to<br/>Higher Tier]
    end

    Heuristics --> Receive
    LocalModel --> Receive
    LLMCall --> Receive

    Receive --> Check
    Check -->|Yes| Cross
    Check -->|No| Escalate
    Cross -->|Yes| Accept
    Cross -->|No| Reject

    style Sources fill:none,stroke:#d97706,stroke-width:2px
    style Constrainer fill:none,stroke:#2563eb,stroke-width:2px
    style Heuristics fill:none,stroke:#d97706,stroke-width:2px
    style LocalModel fill:none,stroke:#d97706,stroke-width:2px
    style LLMCall fill:none,stroke:#d97706,stroke-width:2px
    style Receive fill:none,stroke:#2563eb,stroke-width:2px
    style Check fill:none,stroke:#2563eb,stroke-width:2px
    style Cross fill:none,stroke:#2563eb,stroke-width:2px
    style Accept fill:none,stroke:#16a34a,stroke-width:2px
    style Reject fill:none,stroke:#dc2626,stroke-width:2px
    style Escalate fill:none,stroke:#7c3aed,stroke-width:2px

Real constrainer decisions:

Reject LLM caption when heuristics disagree on text-likeliness
Prefer lower-confidence but evidenced OCR over high-confidence hallucinated text
Escalate to vision LLM only when local model confidence < 0.7

MCP connects components. The constrainer governs them.

Turning the LLM Off (And Why I Do)

My MCP servers still function with the LLM disabled.

This isn't a degraded mode. It's the primary mode.

If your system stops working when the LLM is unavailable, the LLM was doing a job it shouldn't have had.

This is the same principle behind why structure beats brilliance: don't expect one model to do everything. Distribute cognition across a system where deterministic components handle what they're good at.

The core functions:

Deterministic summarisation from extracted facts
Evidence-first retrieval via embeddings and filters
Structured queries against the fact database

The LLM becomes:

A synthesis layer (optional)
An explanation engine (when requested)
An enricher (for natural language output)

The LLM is not:

A decision maker
Memory
A truth engine

flowchart TD
    subgraph AlwaysOn["Always On (Deterministic)"]
        Sensors[Sensors + Heuristics]
        Local[Local Models<br/>Florence-2, Whisper, CLIP]
        Facts[(Facts Database)]
        Query[Query Engine]
    end

    subgraph Optional["Optional (LLM)"]
        Synthesis[Natural Language Synthesis]
        Explain[Explanation Generation]
    end

    Sensors --> Local
    Local --> Facts
    Facts --> Query
    Query --> Synthesis
    Query --> Explain

    style AlwaysOn fill:none,stroke:#16a34a,stroke-width:2px
    style Optional fill:none,stroke:#6b7280,stroke-width:2px,stroke-dasharray: 5 5
    style Sensors fill:none,stroke:#16a34a,stroke-width:2px
    style Local fill:none,stroke:#16a34a,stroke-width:2px
    style Facts fill:none,stroke:#16a34a,stroke-width:3px
    style Query fill:none,stroke:#16a34a,stroke-width:2px
    style Synthesis fill:none,stroke:#6b7280,stroke-width:2px
    style Explain fill:none,stroke:#6b7280,stroke-width:2px

Why this matters:

Benefit	LLM-Required Systems	LLM-Optional Systems
Cost	Per-query API costs	Fixed infrastructure
Reliability	API outage = system down	Core functions continue
Testability	Mock LLM responses	Deterministic assertions
Trust	"The model said so"	Evidence chain

MCP as a Boundary, Not an Integration

I treat MCP as a hard boundary:

Process isolation: MCP servers run in separate processes
Explicit inputs/outputs: No shared mutable state
No ambient context: Each call is evaluated on explicit inputs, not conversation residue
Typed contracts: Schema violations are errors, not warnings

This contrasts with:

In-process agents with shared memory
Prompt-chaining with hidden context accumulation
"Conversational" tool use where prior turns affect behaviour

Why boundaries matter:

Replay: Reproduce any session by replaying inputs
Auditing: Every signal has a traceable origin
Deterministic testing: Same inputs → same outputs
Compliance: Explainable decision chains

flowchart LR
    subgraph Process1["Process: Orchestrator"]
        Orch[Orchestrator<br/>Constrainer Logic]
    end

    subgraph Process2["Process: MCP Server 1"]
        MCP1[Image Analysis<br/>Signals]
    end

    subgraph Process3["Process: MCP Server 2"]
        MCP2[Audio Analysis<br/>Signals]
    end

    subgraph Process4["Process: MCP Server 3"]
        MCP3[Video Analysis<br/>Signals]
    end

    Orch <-->|MCP Protocol| MCP1
    Orch <-->|MCP Protocol| MCP2
    Orch <-->|MCP Protocol| MCP3

    style Process1 fill:none,stroke:#2563eb,stroke-width:2px
    style Process2 fill:none,stroke:#16a34a,stroke-width:2px
    style Process3 fill:none,stroke:#16a34a,stroke-width:2px
    style Process4 fill:none,stroke:#16a34a,stroke-width:2px
    style Orch fill:none,stroke:#2563eb,stroke-width:2px
    style MCP1 fill:none,stroke:#16a34a,stroke-width:2px
    style MCP2 fill:none,stroke:#16a34a,stroke-width:2px
    style MCP3 fill:none,stroke:#16a34a,stroke-width:2px

Failure Modes MCP Examples Don't Talk About

Real failure cases from unconstrained MCP usage:

Failure Mode	Cause	Mitigation
Tool hallucination	Description leakage into LLM reasoning	Minimal descriptions, schema-first design
Over-eager execution	Model calls tools "just to check"	Constrainer gates all invocations
Schema drift	Tool behaviour changes, schema doesn't	Versioned schemas, contract tests
Silent partial failures	Tool returns partial data, model proceeds	Confidence thresholds, completeness checks
LLM overconfidence	Model treats tool output as ground truth	Cross-validation, evidence requirements
Capability escalation	Model "tries" progressively more powerful tools	Capability tiers + policy gating + budgets

The common thread: these failures happen because the LLM is trusted as an authority.

The fix: LLMs propose, constrainers decide, facts persist.

What MCP Is Good At (When Used Properly)

MCP is excellent for:

Capability discovery: Runtime enumeration of available signals
Inter-process composition: Clean boundaries between components
Tooling interoperability: Works with any MCP-compatible client
Model-agnostic integration: Swap LLMs without changing signal contracts

MCP is not:

An agent framework
A reasoning system
A safety layer
A trust boundary

Use MCP for what it is: a wire protocol for structured communication between components.

Where MCP Stops

MCP standardises how models and tools exchange context. It does not define who is allowed to act, under what conditions, or how decisions are audited.

Other approaches exist to formalise those constraints (sometimes described as Action Capability Protocols, ACP): authorisation, attestation, audit trails.

This article isn't about ACP. It's about the design principle: transport and governance are separate concerns.

MCP tells you how to call a tool. Something else must decide whether you should. In my systems, that's the constrainer. In enterprise systems, it might be a policy engine.

Demo MCP vs Production MCP

Aspect	Demo MCP	Production MCP
Tool descriptions	Natural language, detailed	Minimal, schema-first
Who decides to call	LLM	Orchestrator/Constrainer
Response format	Free-form text	Typed signals with confidence
Validation	None	Cross-validation, thresholds
LLM dependency	Required	Optional enrichment
Replay	Non-deterministic	Fully reproducible
Audit trail	Conversation logs	Signal provenance chain

Closing

MCP enables composition. Determinism enables trust. LLMs enable fluency.

But only if:

Probability proposes : and determinism persists.

MCP without a constrainer is just prompt injection with extra steps.

Make synthesis the last step. Make the LLM optional. Make every fact traceable.

Key Terms

MCP (Model Context Protocol): Wire protocol for tool discovery and invocation between processes
Constrainer: Deterministic logic that evaluates proposals and decides what persists
Signal: Typed response with confidence, provenance, and evidence pointers
Proposer: Any component (LLM, local model, heuristic) that suggests facts without authority
Evidence pointer: Reference to verifiable source (bounding box, timestamp, embedding)

Previous in series: Part 1: Why LLMs Fail as Sensors

Theoretical foundations:

Why Structure Beats Brilliance: Voyager, Toolformer, and DiSE - The case for distributed cognition over single-model brilliance
Toolformer paper - Foundational work on LLMs learning to use tools

Pattern implementations: