This is a viewer only at the moment see the article on how this works.
To update the preview hit Ctrl-Alt-R (or ⌘-Alt-R on Mac) or Enter to refresh. The Save icon lets you save the markdown file to disk
This is a preview from the server running through my markdig pipeline
Wednesday, 31 December 2025
This series builds a working ecommerce system that proves you can have sophisticated customer intelligence without storing personal information.
| What We Build | How It Works |
|---|---|
| Semantic segmentation | Vector embeddings group similar interests |
| Real-time personalisation | Session signals, not permanent profiles |
| Full transparency | Users see and adjust their interest signatures |
| Zero PII | Behavioral patterns, not identities |
The sample project (Mostlylucid.SegmentCommerce) is a complete, working implementation you can run locally.
The industry presents personalisation as a binary choice:
Option A: Sophisticated but Misaligned
Option B: Privacy-Respecting but Dumb
This is a false dichotomy. There's a missing architecture.
What if customers could:
This isn't theoretical. We'll build it.
You don't need to know who someone is to understand what they're interested in right now.
flowchart LR
A[Identity] --> B[Permanent profile]
B --> C[Recommendations]
S[Session token] --> T[Interest signals]
T --> U[Session signature]
U --> C
U --> D[Fades naturally]
U --> P[Anonymous profile]
P --> X[Unmasked identity]
style B stroke:#c92a2a,stroke-width:4px
style U stroke:#1971c2,stroke-width:4px
style P stroke:#2f9e44,stroke-width:4px
style D stroke:#868e96,stroke-width:2px
style X stroke:#868e96,stroke-width:2px
An interest signature like:
"yoga • sustainability • minimalism • wellness • organic"
...tells you everything you need for personalisation without telling you anything about the person's identity.
It can live for a single session, or persist as an anonymous profile that the user can reset or export.
Early on, the simplest way to keep that profile stable without logins is client-side fingerprinting—but implemented in a zero-PII / zero-tracking-cookie way.
In the current codebase (Mostlylucid.SegmentCommerce) the browser computes a fingerprint hash (no raw signals sent, no localStorage, no tracking cookie) and POSTs it to /api/fingerprint. The server then HMACs that hash (so it’s useless outside this site) and links it to the current session.
We still use an essential session cookie for short-lived session state (views, cart events, etc.), but we do not need a dedicated “follow-you-forever” tracking cookie to get useful continuity. Later in the series we can upgrade to a logged-in identity mode (highest trust) without changing the rest of the segmentation design.
flowchart TB
A[Session only]
B[Fingerprint mode]
C[Cookie mode]
D[Identity mode]
A -->|no persistence| A
B -->|hash in browser| E[/api/fingerprint/]
E -->|HMAC on server| F[PersistentProfile key]
A -. upgrade .-> B
B -. optional upgrade .-> D
C -. optional upgrade .-> D
style A stroke:#868e96,stroke-width:2px
style B stroke:#1971c2,stroke-width:3px
style C stroke:#fab005,stroke-width:3px
style D stroke:#2f9e44,stroke-width:3px
style F stroke:#1971c2,stroke-width:3px
Crucially: persistence does not have to mean identity. The profile remains detached unless the user explicitly chooses to “unmask” it.
Compare that to traditional profiling:
Name: John Smith
Email: john@example.com
Age: 34
Location: Seattle
Purchase history: [284 items tracked forever]
Browsing history: [Cross-site tracking across 47 domains]
The first approach gives you better recommendations with zero PII. The second invades privacy and still gets it wrong (remember that one impulse purchase still haunting your feed six months later?).
You've experienced the dysfunction yourself:
What users experience:
What they don't get:
What the industry says:
These claims persist because they align with commercial incentives, not technical reality.
It's not that Google, Amazon, Meta, or TikTok can't explain how their recommendation systems work. They absolutely can. The algorithms aren't magic—they're math, statistics, and machine learning that could be explained in plain English.
They choose not to because transparency would undermine the economic assumptions these systems are built on.
The core issue isn't technical complexity—it's that these systems optimise for different outcomes than users assume.
Collecting signals to make a product more useful is not the problem. The problem is when those same signals are repurposed for targeting and behavioural manipulation.
Engagement maximisation often conflicts with user value. Ad placement drives what you see. Behavioural nudging keeps you scrolling.
Transparency would make these conflicts obvious, so opacity becomes a feature, not a limitation.
There are massive commercial and PR reasons to avoid transparency:
1. Data Collection Scope
2. Optimisation Targets
3. Profile Permanence
When pressed, Big Tech hides behind: "The algorithm is too complex for normal users to understand."
This framing is misleading. Users already understand:
They could understand recommendations too—if companies chose to explain them.
Complexity isn't the barrier. Exposure is.
Building a zero-PII customer intelligence system starts with one fundamental principle: users should understand what's happening and why.
You don't need a whitepaper. You need a plain-English mental model that users can internalise in thirty seconds.
Here's what not to say:
"A cluster derived from embeddings in a high-dimensional vector space, derived from similarity scores across vectors..."
Here's what works:
"We group products and interests into small, overlapping segments based on how people interact with them. You're probably in dozens of segments at once—and they change constantly based on what you actually do."
Three key concepts to communicate:
graph TB
User[Anonymous user] --> Action1[Views yoga mat]
Action1 --> Sig1[Signature updated]
Sig1 --> Action2[Views cookbook]
Action2 --> Sig2[Signature updated]
Sig2 --> Action3[Hides product]
Action3 --> Sig3[Signature updated]
Sig3 --> Sig4[Decays over time]
Sig4 --> Sig5[Fades without reinforcement]
Sig3 --> Segment1[Segment: wellness]
Sig4 --> Segment2[Segment: general]
Sig5 --> Segment3[Segment: cold start]
style Sig3 stroke:#1971c2,stroke-width:4px
style Segment1 stroke:#2f9e44,stroke-width:3px
style Segment2 stroke:#fab005,stroke-width:3px
style Segment3 stroke:#868e96,stroke-width:2px
This framing immediately differentiates your system from the creepy "you looked at this once, now we'll show it to you forever" behaviour users have come to expect. This is closer to how people actually behave than static "profiles" ever were.
Transparency means being specific about what actions influence segmentation—and for how long. A simple table does wonders here:
| Action | Signal Strength | Duration | Notes |
|---|---|---|---|
| Single click/view | Weak | Minutes–hours | Curiosity, not commitment |
| Multiple views over time | Medium | Days | Growing interest |
| Explicit "I'm interested" | Strong | Weeks | Clear signal |
| Save/bookmark | Strong | Weeks+ | Intentional signal |
| "Not relevant" / Hide | Suppression | Long | Respect the signal |
| No reinforcement | Decay | Varies | Interest fades naturally |
This table alone transforms the user experience from "mysterious algorithm" to "fair system I can influence."
This is the single biggest difference between segmentation and profiling.
Here's where zero-PII segmentation shines: interests fade unless reinforced.
"One late-night browse won't follow you for weeks. If you don't keep engaging with something, we assume you've moved on."
Traditional tracking systems build permanent profiles. Every action accumulates forever, creating an increasingly distorted picture of who you are.
A decay-based system is fundamentally different:
You don't need to explain the exponential decay function or half-life calculations. Users need reassurance, not mathematics.
Here's what radically differentiates this approach: customers can see and adjust their own interest signatures.
Imagine a simple interface that shows:
Your Current Interests (this session)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🌱 Sustainable Products ████████░░ 80%
🧘 Yoga & Wellness ███████░░░ 70%
📚 Minimalism █████░░░░░ 50%
🏃 Athletic Gear ████░░░░░░ 40%
🌿 Organic Foods ███░░░░░░░ 30%
[Remove] [Adjust] [Add Interest]
These fade over time unless you keep engaging.
Last updated: 2 minutes ago
This level of transparency gives users:
Compare this to traditional systems where you have no idea what profile they've built about you, no way to inspect it, and no control to adjust it.
Even with minimal implementation, you can offer capabilities that almost no ecommerce systems provide:
Basic Controls:
Advanced Controls (for later):
The key insight: These aren't just features—they're trust signals. They communicate: "This system responds to you. You're not being subjected to it."
Traditional systems keep algorithms opaque to avoid exposing the scope of data collection, behavioural inference, and how that information is monetised (targeting, nudges, attribution).
We can be radically transparent because there's nothing invasive to hide:
When users inspect their interest signature, they see clean semantic concepts: "sustainable products • yoga • minimalism" rather than demographic inferences or behavioural predictions.
This transparency isn't just ethical. It's a competitive advantage because you can say what competitors can't:
"Here's exactly how our recommendations work. Inspect it. Control it. Trust it."
When you can explain your algorithm openly, you can build features that targeting-driven systems cannot:
1. Real-Time Interest Dashboard
2. Explicit Controls
3. Recommendation Explanations
4. Algorithmic Auditing
5. Data Portability
Notice what Big Tech can't build without admitting their practices:
They're locked out of building trust features because transparency would expose the surveillance.
Once segmentation is clearly explained, you can layer features that compound trust:
You're not building features in isolation. You're creating a coherent system where each piece reinforces the mental model users already have—and can verify.
Part 2 covers the full implementation. Here's the architecture at a glance:
flowchart LR
Session[Session Profile<br/>in-memory only] -->|high intent| Persistent[Persistent Profile<br/>database]
Session -->|expires| Gone[Lost forever]
style Session stroke:#1971c2,stroke-width:3px
style Persistent stroke:#2f9e44,stroke-width:3px
style Gone stroke:#868e96,stroke-width:2px
Segments aren't "in or out". They're fuzzy memberships with scores (0-1):
| Profile | Tech Enthusiast | Bargain Hunter | Cart Abandoner |
|---|---|---|---|
| A | 0.85 | 0.20 | 0.10 |
| B | 0.30 | 0.75 | 0.60 |
| C | 0.95 | 0.05 | 0.00 |
And every score is explainable—users can see exactly why they're in a segment.
Interests fade unless reinforced:
Day 0: View product → Signal: 1.0
Day 7: No activity → Signal: 0.5
Day 14: No activity → Signal: 0.25
Day 21: → Effectively gone
That one late-night browse doesn't define you forever.
Part 1.1: Generating synthetic sample data locally (Ollama + ComfyUI)
Part 2: Session profiles, signals, and segment definitions—the full implementation
Part 3 (coming): Outbox pattern, job queue, and the transparency UI
When personalisation is built from process instead of identity, privacy stops being a constraint—it becomes a property of the system.
© 2026 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.