One of the easiest ways to make an AI system fail is to let it remember too much.
Most "context handling" approaches boil down to one of two mistakes:
Both approaches are expensive, opaque, and unstable over long horizons.
There is a better pattern - one that mirrors how biological systems actually scale cognition.
I call it Constrained Fuzzy Context Dragging (CFCD).
Constrained Fuzzy Context Dragging is a pattern where probabilistic models propose what is salient, but only deterministic rules are allowed to carry that salience forward to influence future reasoning or generation.
Or more bluntly:
Models may notice. Engineering decides what persists.
This is the same philosophical split as Constrained Fuzziness and Constrained Fuzzy MoM - just applied along the time axis instead of the decision axis.
LLMs do not "remember". They pattern-match over tokens.
When we treat context as "just more text", we get:
Longer context windows do not solve this. They just delay the failure.
The real question is not how much context to keep, but:
What deserves to survive?
CFCD enforces a hard separation:
| Role | Allowed to be fuzzy? | Allowed to become future constraint? |
|---|---|---|
| Salience detection | Yes | No |
| Context promotion | No | Yes |
| Generation | Yes | No |
| Anchor storage (ledger) | No | Yes |
Probability can suggest. Determinism decides what gets dragged forward.
Deterministic means: same inputs + same policy ⇒ same anchors. You can still use probabilistic scores; you just run them through fixed promotion/expiry functions (MMR, RRF, vote rules).
CFCD is how you get the benefits of context without paying the price of context.
This fails because:
The key insight: salience detection doesn't require an LLM. You can use deterministic ML (embeddings, TF-IDF, BM25) to find what matters, then constrain the LLM to only see what survived selection.
DocSummarizer implements this:
The LLM never sees the full document. It generates around the anchors, not through them.
// From DocSummarizer: SegmentExtractor.cs
public class SegmentExtractor
{
private readonly OnnxEmbeddingService _embeddingService; // ML, not LLM
public async Task<ExtractionResult> ExtractAsync(string docId, string markdown)
{
// 1. Parse document into typed segments
var segments = ParseToSegments(docId, markdown);
// 2. For large docs: semantic pre-filtering with multi-anchor approach
// - Guaranteed coverage: first/last sentences per section, constraints
// - BM25 candidate generation with pseudo-query from TF-IDF terms
// - Topic anchors via k-means clustering on sample embeddings
if (segments.Count > _config.MaxSegmentsToEmbed)
{
(segmentsToEmbed, centroid) = await SemanticPreFilterAsync(segments, targetCount);
}
else
{
await GenerateEmbeddingsAsync(segments);
centroid = CalculateCentroid(segments);
}
// 3. Score by salience using MMR (deterministic)
// Salience = lambda * sim(segment, centroid) * position_weight * content_weight
// - (1 - lambda) * max_sim(segment, higher_ranked_segments)
ComputeSalienceScores(segments, centroid, contentType);
// 4. Return top-K as anchors for synthesis
var topBySalience = segments
.OrderByDescending(s => s.SalienceScore)
.Take(Math.Max(_config.FallbackBucketSize, targetCount))
.ToList();
return new ExtractionResult
{
AllSegments = segments,
TopBySalience = topBySalience,
Centroid = centroid,
ContentType = contentType
};
}
/// <summary>
/// Greedy MMR selection: balances relevance with diversity
/// </summary>
private void ComputeSalienceScores(List<Segment> segments, float[] centroid, ContentType contentType)
{
var candidates = new HashSet<Segment>(segments.Where(s => s.Embedding != null));
var ranked = new List<Segment>();
// Pre-compute centroid similarities with content-type adjustments
foreach (var segment in candidates)
{
var baseSim = CosineSimilarity(segment.Embedding!, centroid);
var contentWeight = ComputeContentTypeWeight(segment, contentType);
segment.SalienceScore = baseSim * segment.PositionWeight * contentWeight;
}
// Greedy MMR: pick best, penalise similar, repeat
while (candidates.Count > 0)
{
var best = candidates
.Select(c => new {
Segment = c,
Score = _config.MmrLambda * c.SalienceScore
- (1 - _config.MmrLambda) * MaxSimToRanked(c, ranked)
})
.OrderByDescending(x => x.Score)
.First();
best.Segment.SalienceScore = 1.0 - ((double)ranked.Count / segments.Count);
ranked.Add(best.Segment);
candidates.Remove(best.Segment);
}
}
}
Because the output is a ranked collection, you can adapt anchor count to the synthesis model at runtime:
public class AdaptiveSynthesizer
{
public int ComputeAnchorBudget(ModelProfile model, ExtractionResult extraction)
{
// Larger context windows → more anchors
var baseTokenBudget = model.ContextWindow / 4; // Reserve 25% for output
// More powerful models → can handle more complex synthesis
var complexityMultiplier = model.Tier switch
{
ModelTier.Small => 0.5, // tinyllama: fewer, simpler anchors
ModelTier.Medium => 1.0, // llama3.2:3b: standard
ModelTier.Large => 1.5, // llama3.1:8b+: more context, nuanced synthesis
_ => 1.0
};
var anchorBudget = (int)(baseTokenBudget * complexityMultiplier / _avgTokensPerSegment);
// Never exceed what we have
return Math.Min(anchorBudget, extraction.TopBySalience.Count);
}
public async Task<Summary> SynthesizeAsync(
ExtractionResult extraction,
ModelProfile model)
{
var budget = ComputeAnchorBudget(model, extraction);
// Take top-K from already-ranked segments
var anchors = extraction.TopBySalience.Take(budget).ToList();
// Synthesis model only sees what survived selection + adaptation
return await _llm.SynthesizeAsync(anchors, model);
}
}
This is still deterministic: same inputs + same model profile → same anchors. The ranking happens once; adaptation is just a slice.
This means you can:
The key insight: anchors are structure, not prose. The embeddings and MMR are deterministic. The LLM only does the final synthesis, bounded by what survived.
Anchor contract: generation may vary phrasing, but must not contradict anchors. If anchors are insufficient, it must hedge and/or request more evidence.
A concrete anchor representation:
{
"policyVersion": "cfcd-v1",
"segments": [
{"id":"seg-12","text":"Reset requires holding button 10s","salience":0.92},
{"id":"seg-45","text":"Factory reset clears all settings","salience":0.88}
],
"coverage": "3.2% semantic sample",
"hedging": "sampled 3% - avoid definitive conclusions"
}
Translation exposes this pattern even more clearly.
Translate sentence-by-sentence and you get:
Even with a huge context window, models still "feel free" to vary phrasing.
What counts as strong local evidence?
This is deterministic: structural cues are detected by pattern or parser classification, not model judgment.
Default policy: Δ=0.08 confidence gap, W=50 chunks (tune per domain).
public class TermLedger
{
private readonly Dictionary<string, TermMapping> _mappings = new();
public record TermMapping(
string SourceTerm,
string TargetTerm,
float Confidence,
string FirstSeenChunkId,
int LastSeenChunkIndex, // Track recency
int VotesForCurrentTarget,
int TotalVotes, // Preserve history
string? ChallengerTarget = null,
int ChallengerVotes = 0,
float ChallengerConfidence = 0);
public void ProposeMapping(string source, string target, float confidence, string chunkId, int chunkIndex)
{
source = source.ToLowerInvariant().Trim();
if (_mappings.TryGetValue(source, out var existing))
{
if (existing.TargetTerm == target)
{
// Same as current: accumulate votes
_mappings[source] = existing with
{
Confidence = Math.Max(existing.Confidence, confidence),
VotesForCurrentTarget = existing.VotesForCurrentTarget + 1,
TotalVotes = existing.TotalVotes + 1,
LastSeenChunkIndex = chunkIndex
};
}
else if (existing.ChallengerTarget == target)
{
// Same as challenger: accumulate challenger votes
var newChallengerVotes = existing.ChallengerVotes + 1;
var newChallengerConfidence = Math.Max(existing.ChallengerConfidence, confidence);
// Switch if challenger has >= 2 votes AND higher confidence
if (newChallengerVotes >= 2 && newChallengerConfidence > existing.Confidence)
{
// Swap: old current becomes challenger, challenger becomes current
_mappings[source] = existing with
{
TargetTerm = target,
Confidence = newChallengerConfidence,
VotesForCurrentTarget = newChallengerVotes,
TotalVotes = existing.TotalVotes + 1,
LastSeenChunkIndex = chunkIndex,
ChallengerTarget = existing.TargetTerm, // Preserve old mapping as challenger
ChallengerVotes = existing.VotesForCurrentTarget,
ChallengerConfidence = existing.Confidence
};
}
else
{
_mappings[source] = existing with
{
ChallengerVotes = newChallengerVotes,
ChallengerConfidence = newChallengerConfidence,
TotalVotes = existing.TotalVotes + 1,
LastSeenChunkIndex = chunkIndex
};
}
}
else if (confidence > existing.ChallengerConfidence)
{
// New challenger replaces old challenger
_mappings[source] = existing with
{
ChallengerTarget = target,
ChallengerVotes = 1,
ChallengerConfidence = confidence,
TotalVotes = existing.TotalVotes + 1,
LastSeenChunkIndex = chunkIndex
};
}
}
else
{
_mappings[source] = new TermMapping(
source, target, confidence, chunkId, chunkIndex,
VotesForCurrentTarget: 1, TotalVotes: 1);
}
}
public string? GetCanonicalTranslation(string source)
{
source = source.ToLowerInvariant().Trim();
return _mappings.TryGetValue(source, out var mapping)
? mapping.TargetTerm
: null;
}
}
In practice you also want vote decay (or a sliding window) so early mistakes don't become permanent. The key is that the decay rule is deterministic, not model-decided.
Why this matters for debugging: If the translation flips "factory reset" mid-document, you can diff the ledger and see exactly when and why:
chunk-12: "factory reset" → "Werkseinstellungen" (votes: 3, conf: 0.82)
chunk-47: challenger "Zurücksetzen" reaches 2 votes, conf: 0.91
chunk-47: SWITCH - "factory reset" → "Zurücksetzen" (old mapping preserved as challenger)
No guessing. No "the model just decided". Inspectable, deterministic, auditable.
This gives you:
This is why terminology consistency cannot be solved by "just more context".
CFCD does not give the model memory.
It gives the system memory.
The model:
The system:
Expiry rules (deterministic, not model-decided):
T without being referencedN subsequent chunks decay outThat distinction is everything.
flowchart TB
subgraph Model["Probabilistic Model"]
M1[Detect Salience]
M2[Generate Output]
end
subgraph System["Deterministic System"]
S1[Promote Anchors]
S2[Store in Ledger]
S3[Enforce on Generation]
S4[Expire Stale Context]
end
M1 --> S1 --> S2
S2 --> S3 --> M2
S2 --> S4
style Model stroke:#f59e0b,stroke-width:2px
style System stroke:#22c55e,stroke-width:3px
Here is the pattern with the ledger as a first-class object:
flowchart LR
S[Stream: chunks/sentences] --> M[Model proposes salience]
M --> P[Promotion rules]
P --> L[Ledger / Anchors]
L --> G[Generation constrained by anchors]
G --> O[Output]
L --> E[Expiry rules]
E -.-> L
style L stroke:#22c55e,stroke-width:3px
style P stroke:#22c55e,stroke-width:2px
style E stroke:#22c55e,stroke-width:2px
style M stroke:#f59e0b,stroke-width:2px
style G stroke:#f59e0b,stroke-width:2px
The green boxes are deterministic. The orange boxes are probabilistic. The ledger sits between them, holding what survived.
Three shifts:
The prompt becomes a view of the ledger, not the ledger itself.
That's the "grok" moment. The ledger is not context. It's a contract.
Anchors are not "important text". They are typed, structured facts that constrain generation:
factory reset → restore factory settingsProductName, ModelNumber, UserIDpolicyVersion: "cfcd-v1", anchorSchemaVersion: 2The point is: anchors are not prose. They are structure. The model writes around them.
This is an analogy, not a biological claim: the point is selective consolidation, not literal mechanism.
Brains do not replay the entire sensory stream.
They:
Language, motor skills, and perception all rely on constrained carry-forward, not raw recall.
CFCD is the same principle, implemented mechanically.
| Biological System | CFCD Equivalent |
|---|---|
| Attention (what to notice) | Salience detection (fuzzy) |
| Consolidation (what to remember) | Anchor promotion (deterministic) |
| Inhibition (what to ignore) | Expiration rules (deterministic) |
| Recall (what to retrieve) | Ledger query (deterministic) |
CFMoM and CFCD are the same idea viewed from different angles:
| Dimension | Pattern |
|---|---|
| Single proposer | Constrained Fuzziness |
| Multiple proposers | Constrained Fuzzy MoM |
| Long-horizon context | Constrained Fuzzy Context Dragging |
All three obey the same rule:
Probability proposes. Determinism persists.
Once you internalise that rule, most "AI system design" stops being mysterious.
flowchart LR
subgraph CFP["Part 1: Constrained Fuzziness"]
P1[Single Proposer] --> C1[Constrainer] --> O1[Output]
end
subgraph CFMoM["Part 2: Constrained Fuzzy MoM"]
P2A[Proposer A] --> CS[Consensus Space]
P2B[Proposer B] --> CS
P2C[Proposer C] --> CS
CS --> C2[Coordination Constrainer] --> O2[Output]
end
subgraph CFCD["Part 3: Context Dragging"]
T1[Time T] --> A1[Anchors]
T2[Time T+1] --> A1
A1 --> C3[Constrained Generation] --> O3[Output]
end
style CFP stroke:#22c55e,stroke-width:2px
style CFMoM stroke:#3b82f6,stroke-width:2px
style CFCD stroke:#8b5cf6,stroke-width:2px
Because it:
In other words: it optimises for production, not hype.
| Anti-Pattern | Problem | CFCD Solution |
|---|---|---|
| Stuff everything in context | Unbounded cost, attention dilution | Promote only what survives selection |
| Recursive summarisation | Drift, contradictions | Anchors are structure, not prose |
| Hope the model remembers | No guarantees | System owns persistence |
| Per-request full re-processing | Cost scales with history | Ledger persists across requests |
| Natural language "memory" | Semantic drift | Typed anchors, explicit mappings |
| One-size-fits-all context | Wastes capacity or starves model | Adaptive slicing from ranked collection |
Mostlylucid.Ephemeral is the substrate that makes CFCD practical. If you haven't read the background articles:
| Ephemeral Primitive | CFCD Role |
|---|---|
| Signals | Candidate salience (model proposes) |
| Bounded windows | No unbounded "memory" |
| Eviction rules | Forgetting is explicit |
| Sliding expiration | Stale context expires automatically |
| Signal propagation | Anchors carry forward without mutation |
Ephemeral does not "remember". It decides what is allowed to be remembered, and for how long.
That is exactly what CFCD requires.
public class CFCDCoordinator
{
private readonly SignalSink _sink;
private readonly TermLedger _ledger;
private readonly TimeSpan _anchorLifetime = TimeSpan.FromHours(1);
public void PromoteAnchor(string term, string translation, string chunkId, int chunkIndex)
{
_ledger.ProposeMapping(term, translation, 0.8f, chunkId, chunkIndex);
// Emit signal for observability (key = chunkId for correlation)
_sink.Raise($"anchor.promoted:{term}", key: chunkId);
}
public void ExpireStaleAnchors()
{
var cutoff = DateTimeOffset.UtcNow - _anchorLifetime;
// Ledger entries older than cutoff without recent usage are removed
// This is deterministic, not model-decided
_ledger.ExpireOlderThan(cutoff);
_sink.Raise("anchors.expired");
}
}
Bringing it together for a complete document translation pipeline:
public class CFCDTranslationPipeline
{
private readonly ILlmService _llm;
private readonly TermLedger _ledger;
private readonly SignalSink _sink;
private readonly int _seedChunks = 3; // K: configurable seed window
public async Task<TranslatedDocument> TranslateAsync(
Document source,
string targetLanguage)
{
var chunks = Chunker.Chunk(source, maxTokens: 500).ToList();
var translated = new List<TranslatedChunk>();
// Phase 1: Extract terminology candidates (first pass, fuzzy)
// Seed from first K chunks: front-loads terminology, avoids churn later
for (var i = 0; i < Math.Min(_seedChunks, chunks.Count); i++)
{
var chunk = chunks[i];
var candidates = await _llm.ExtractTerminologyAsync(chunk);
foreach (var term in candidates)
{
_ledger.ProposeMapping(
term.Source,
term.ProposedTranslation,
term.Confidence,
chunkId: chunk.Id,
chunkIndex: i);
}
}
_sink.Raise("terminology.seeded");
// Phase 2: Translate with ledger constraints
for (var i = 0; i < chunks.Count; i++)
{
var chunk = chunks[i];
// Get canonical translations for terms in this chunk
var constraints = ExtractConstraints(chunk, _ledger);
// Translate with constraints (model must respect ledger)
var result = await _llm.TranslateWithConstraintsAsync(
chunk,
targetLanguage,
constraints);
// Update ledger with any new terms (deterministic rules)
foreach (var newTerm in result.NewTerminology)
{
if (ShouldPromote(newTerm, _ledger))
{
_ledger.ProposeMapping(
newTerm.Source,
newTerm.Target,
newTerm.Confidence,
chunkId: chunk.Id,
chunkIndex: i);
_sink.Raise($"anchor.promoted:{newTerm.Source}");
}
}
translated.Add(result.Chunk);
}
return new TranslatedDocument(translated, _ledger.GetAllMappings());
}
private bool ShouldPromote(TermCandidate term, TermLedger ledger)
{
// Deterministic rules, not model judgment:
// - Proper nouns always promoted
// - Technical terms promoted if repeated
// - Common words never promoted
return term.Type == TermType.ProperNoun ||
(term.Type == TermType.Technical && term.OccurrenceCount >= 2);
}
}
The mistake most AI systems make is assuming intelligence improves by remembering more.
In practice, intelligence scales by deciding what to never have to think about again.
CFCD is how you do that without lying to yourself.
| Part | Pattern | Axis |
|---|---|---|
| 1 | Constrained Fuzziness | Single component |
| 2 | Constrained Fuzzy MoM | Multiple components |
| 3 | Constrained Fuzzy Context Dragging | Time / memory |
All three patterns share the same invariant: probabilistic components propose; deterministic systems persist.
The Ten Commandments of LLM Use codify these rules. The DiSE architecture implements them for code evolution. Bot Detection implements them for request classification. DocSummarizer implements them for document retrieval.
Different domains. Same pattern. Same rule.
Part 4 will be a runnable reference implementation: a CLI + sample documents demonstrating a translation pipeline with a real term ledger, vote decay, and Ephemeral integration. Code you can clone and run, not just read.
© 2026 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.