"How do I build an MCP server?" is the wrong question.
SDKs exist. An LLM can scaffold one in seconds. The real question is:
What role should an MCP server play inside a system that must not lie?
The scarce skill now is designing the system around MCP: authority, authorisation, auditability, and deterministic control. MCP handles transport : those concerns live elsewhere.
This article explains how I use MCP without agents, autonomy, or vibes, and why most MCP examples recreate the same problems we've seen with unconstrained tool calling.
This is Part 2 of "LLMs as Components". Part 1: Why LLMs Fail as Sensors covered the category error of using LLMs for perception. This article covers the category error of using MCP as architecture.
MCP (Model Context Protocol) is a wire protocol. It moves:
It does not:
flowchart LR
subgraph MCP["MCP: What It Does"]
Schema[Schema Discovery] --> Transport[Transport Layer]
Transport --> Invoke[Tool Invocation]
Invoke --> Response[Structured Response]
end
subgraph NotMCP["Not MCP's Job"]
When[When to run?]
Trust[Is this true?]
Should[Should we act?]
Safe[Is this safe?]
end
style MCP fill:none,stroke:#16a34a,stroke-width:2px
style NotMCP fill:none,stroke:#dc2626,stroke-width:2px
style Schema fill:none,stroke:#059669,stroke-width:2px
style Transport fill:none,stroke:#059669,stroke-width:2px
style Invoke fill:none,stroke:#059669,stroke-width:2px
style Response fill:none,stroke:#059669,stroke-width:2px
style When fill:none,stroke:#dc2626,stroke-width:2px
style Trust fill:none,stroke:#dc2626,stroke-width:2px
style Should fill:none,stroke:#dc2626,stroke-width:2px
style Safe fill:none,stroke:#dc2626,stroke-width:2px
MCP is closer to OpenAPI than it is to an agent framework.
If you're using MCP as your reasoning layer, you've already made the same category error described in Part 1.
If MCP is transport, then the architecture lives above it: orchestration, policy, validation, persistence. The rest of this article is that layer.
This is the foundational rule from Reduced RAG and Constrained Fuzziness:
In my systems:
"Deterministic" here doesn't mean simplistic. It means rule-governed, reproducible, and auditable.
flowchart TD
subgraph Proposers["Proposers (Probabilistic)"]
LLM[LLM Synthesis]
MCP1[MCP Tool Response]
MCP2[MCP Tool Response]
end
subgraph Constrainer["Constrainer (Deterministic)"]
Validate[Validate Proposals]
Compare[Compare Confidence]
Policy[Apply Policy Rules]
Decide[Accept / Reject / Escalate]
end
subgraph Persistence["Persistence (Facts)"]
Facts[(Verified Facts<br/>With Provenance)]
end
LLM --> Validate
MCP1 --> Validate
MCP2 --> Validate
Validate --> Compare
Compare --> Policy
Policy --> Decide
Decide --> Facts
style Proposers fill:none,stroke:#d97706,stroke-width:2px
style Constrainer fill:none,stroke:#2563eb,stroke-width:2px
style Persistence fill:none,stroke:#16a34a,stroke-width:2px
style LLM fill:none,stroke:#d97706,stroke-width:2px
style MCP1 fill:none,stroke:#d97706,stroke-width:2px
style MCP2 fill:none,stroke:#d97706,stroke-width:2px
style Validate fill:none,stroke:#2563eb,stroke-width:2px
style Compare fill:none,stroke:#2563eb,stroke-width:2px
style Policy fill:none,stroke:#2563eb,stroke-width:2px
style Decide fill:none,stroke:#2563eb,stroke-width:2px
style Facts fill:none,stroke:#16a34a,stroke-width:3px
The MCP server doesn't decide. The constrainer decides.
The default MCP framing is:
This fails for predictable reasons:
Toolformer (2023) established the theoretical basis for LLM tool use: models can learn when to call tools by measuring actual outcomes. This was foundational work. But Toolformer operated at training time with curated tool sets - not runtime agent systems with arbitrary MCP endpoints. The lesson from Toolformer isn't "let models choose freely" - it's measure outcomes objectively. See my comparison of Voyager, Toolformer, and structured approaches for why structure beats brilliance.
Reframe: MCP endpoints publish signals, not actions.
flowchart LR
subgraph Wrong["❌ Tools Mental Model"]
Desc[Tool Description<br/>'Gets weather for city'] --> LLM1[LLM Chooses]
LLM1 --> Execute[Execute Action]
Execute --> Trust1[Trust Result?]
end
subgraph Right["✓ Signals Mental Model"]
Schema2[Typed Schema<br/>city: string, units: enum] --> Invoke2[Deterministic Invoke]
Invoke2 --> Signal[Signal Response<br/>+ confidence + provenance]
Signal --> Validate2[Constrainer Validates]
end
style Wrong fill:none,stroke:#dc2626,stroke-width:2px
style Right fill:none,stroke:#16a34a,stroke-width:2px
style Desc fill:none,stroke:#dc2626,stroke-width:2px
style LLM1 fill:none,stroke:#dc2626,stroke-width:2px
style Execute fill:none,stroke:#dc2626,stroke-width:2px
style Trust1 fill:none,stroke:#dc2626,stroke-width:2px
style Schema2 fill:none,stroke:#16a34a,stroke-width:2px
style Invoke2 fill:none,stroke:#16a34a,stroke-width:2px
style Signal fill:none,stroke:#16a34a,stroke-width:2px
style Validate2 fill:none,stroke:#16a34a,stroke-width:2px
Every MCP endpoint in my systems has:
No free-text authority. No "best effort" answers.
Examples from production systems:
| Signal | Confidence Source | Evidence Pointer |
|---|---|---|
| OCR text extraction | Florence-2 confidence score | Bounding box coordinates |
| Image classification | CLIP similarity score | Embedding + similarity score + reference image ID |
| Audio transcription | Whisper word-level confidence | Timestamp range |
| Speaker identification | Diarization cluster distance | Segment boundaries |
The rule: If a result can't be validated, it isn't accepted.
This is the same principle as Design Rule #5: facts need provenance.
Most MCP tutorials show:
The missing component is the constrainer - the deterministic logic that:
flowchart TD
subgraph Sources["Signal Sources"]
Heuristics[Heuristics<br/>Text-likeliness: 0.3]
LocalModel[Local Model<br/>Florence-2 OCR: 0.85]
LLMCall[LLM Escalation<br/>GPT-4V: 0.92]
end
subgraph Constrainer["Constrainer Logic"]
Receive[Receive All Signals]
Check{Confidence<br/>≥ 0.7?}
Cross[Cross-Validate<br/>Signals Agree?]
Accept[Accept as Fact]
Reject[Reject / Log]
Escalate[Escalate to<br/>Higher Tier]
end
Heuristics --> Receive
LocalModel --> Receive
LLMCall --> Receive
Receive --> Check
Check -->|Yes| Cross
Check -->|No| Escalate
Cross -->|Yes| Accept
Cross -->|No| Reject
style Sources fill:none,stroke:#d97706,stroke-width:2px
style Constrainer fill:none,stroke:#2563eb,stroke-width:2px
style Heuristics fill:none,stroke:#d97706,stroke-width:2px
style LocalModel fill:none,stroke:#d97706,stroke-width:2px
style LLMCall fill:none,stroke:#d97706,stroke-width:2px
style Receive fill:none,stroke:#2563eb,stroke-width:2px
style Check fill:none,stroke:#2563eb,stroke-width:2px
style Cross fill:none,stroke:#2563eb,stroke-width:2px
style Accept fill:none,stroke:#16a34a,stroke-width:2px
style Reject fill:none,stroke:#dc2626,stroke-width:2px
style Escalate fill:none,stroke:#7c3aed,stroke-width:2px
Real constrainer decisions:
MCP connects components. The constrainer governs them.
My MCP servers still function with the LLM disabled.
This isn't a degraded mode. It's the primary mode.
If your system stops working when the LLM is unavailable, the LLM was doing a job it shouldn't have had.
This is the same principle behind why structure beats brilliance: don't expect one model to do everything. Distribute cognition across a system where deterministic components handle what they're good at.
The core functions:
The LLM becomes:
The LLM is not:
flowchart TD
subgraph AlwaysOn["Always On (Deterministic)"]
Sensors[Sensors + Heuristics]
Local[Local Models<br/>Florence-2, Whisper, CLIP]
Facts[(Facts Database)]
Query[Query Engine]
end
subgraph Optional["Optional (LLM)"]
Synthesis[Natural Language Synthesis]
Explain[Explanation Generation]
end
Sensors --> Local
Local --> Facts
Facts --> Query
Query --> Synthesis
Query --> Explain
style AlwaysOn fill:none,stroke:#16a34a,stroke-width:2px
style Optional fill:none,stroke:#6b7280,stroke-width:2px,stroke-dasharray: 5 5
style Sensors fill:none,stroke:#16a34a,stroke-width:2px
style Local fill:none,stroke:#16a34a,stroke-width:2px
style Facts fill:none,stroke:#16a34a,stroke-width:3px
style Query fill:none,stroke:#16a34a,stroke-width:2px
style Synthesis fill:none,stroke:#6b7280,stroke-width:2px
style Explain fill:none,stroke:#6b7280,stroke-width:2px
Why this matters:
| Benefit | LLM-Required Systems | LLM-Optional Systems |
|---|---|---|
| Cost | Per-query API costs | Fixed infrastructure |
| Reliability | API outage = system down | Core functions continue |
| Testability | Mock LLM responses | Deterministic assertions |
| Trust | "The model said so" | Evidence chain |
I treat MCP as a hard boundary:
This contrasts with:
Why boundaries matter:
flowchart LR
subgraph Process1["Process: Orchestrator"]
Orch[Orchestrator<br/>Constrainer Logic]
end
subgraph Process2["Process: MCP Server 1"]
MCP1[Image Analysis<br/>Signals]
end
subgraph Process3["Process: MCP Server 2"]
MCP2[Audio Analysis<br/>Signals]
end
subgraph Process4["Process: MCP Server 3"]
MCP3[Video Analysis<br/>Signals]
end
Orch <-->|MCP Protocol| MCP1
Orch <-->|MCP Protocol| MCP2
Orch <-->|MCP Protocol| MCP3
style Process1 fill:none,stroke:#2563eb,stroke-width:2px
style Process2 fill:none,stroke:#16a34a,stroke-width:2px
style Process3 fill:none,stroke:#16a34a,stroke-width:2px
style Process4 fill:none,stroke:#16a34a,stroke-width:2px
style Orch fill:none,stroke:#2563eb,stroke-width:2px
style MCP1 fill:none,stroke:#16a34a,stroke-width:2px
style MCP2 fill:none,stroke:#16a34a,stroke-width:2px
style MCP3 fill:none,stroke:#16a34a,stroke-width:2px
Real failure cases from unconstrained MCP usage:
| Failure Mode | Cause | Mitigation |
|---|---|---|
| Tool hallucination | Description leakage into LLM reasoning | Minimal descriptions, schema-first design |
| Over-eager execution | Model calls tools "just to check" | Constrainer gates all invocations |
| Schema drift | Tool behaviour changes, schema doesn't | Versioned schemas, contract tests |
| Silent partial failures | Tool returns partial data, model proceeds | Confidence thresholds, completeness checks |
| LLM overconfidence | Model treats tool output as ground truth | Cross-validation, evidence requirements |
| Capability escalation | Model "tries" progressively more powerful tools | Capability tiers + policy gating + budgets |
The common thread: these failures happen because the LLM is trusted as an authority.
The fix: LLMs propose, constrainers decide, facts persist.
MCP is excellent for:
MCP is not:
Use MCP for what it is: a wire protocol for structured communication between components.
MCP standardises how models and tools exchange context. It does not define who is allowed to act, under what conditions, or how decisions are audited.
Other approaches exist to formalise those constraints (sometimes described as Action Capability Protocols, ACP): authorisation, attestation, audit trails.
This article isn't about ACP. It's about the design principle: transport and governance are separate concerns.
MCP tells you how to call a tool. Something else must decide whether you should. In my systems, that's the constrainer. In enterprise systems, it might be a policy engine.
| Aspect | Demo MCP | Production MCP |
|---|---|---|
| Tool descriptions | Natural language, detailed | Minimal, schema-first |
| Who decides to call | LLM | Orchestrator/Constrainer |
| Response format | Free-form text | Typed signals with confidence |
| Validation | None | Cross-validation, thresholds |
| LLM dependency | Required | Optional enrichment |
| Replay | Non-deterministic | Fully reproducible |
| Audit trail | Conversation logs | Signal provenance chain |
MCP enables composition. Determinism enables trust. LLMs enable fluency.
But only if:
Probability proposes : and determinism persists.
MCP without a constrainer is just prompt injection with extra steps.
Make synthesis the last step. Make the LLM optional. Make every fact traceable.
Previous in series: Part 1: Why LLMs Fail as Sensors
Theoretical foundations:
Pattern implementations:
© 2026 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.