This is a viewer only at the moment see the article on how this works.
To update the preview hit Ctrl-Alt-R (or ⌘-Alt-R on Mac) or Enter to refresh. The Save icon lets you save the markdown file to disk
This is a preview from the server running through my markdig pipeline
Wednesday, 24 December 2025
There's a widely-held belief in tech that you cannot build sophisticated customer intelligence without:
🎅 MERRY CHRISTMAS 🎅: As you'll have noticed Santa is NOT imminent as you read this most likely. It's my lazy pinning system for articles. I publish then add to them until the publication date...so you get to suffer my DRAFTS! In this case I espect to have the working system by publication date! 🎅
Both assumptions are wrong—not philosophically, but architecturally and operationally.
This series builds a working ecommerce system that proves you can have:
We'll build this using vector embeddings, session-based tracking, and aggregate analytics—all open and explainable.
In this first part, we establish the conceptual foundation: why transparency isn't just ethical, it's better product design.
The industry presents personalisation as a binary choice:
Option A: Sophisticated but Misaligned
Option B: Privacy-Respecting but Dumb
This is a false dichotomy. There's a missing architecture.
What if customers could:
This isn't theoretical. We'll build it.
You don't need to know who someone is to understand what they're interested in right now.
flowchart LR
A[Identity] --> B[Permanent profile]
B --> C[Recommendations]
S[Session token] --> T[Interest signals]
T --> U[Session signature]
U --> C
U --> D[Fades naturally]
U --> P[Anonymous profile]
P --> X[Unmasked identity]
style B stroke:#c92a2a,stroke-width:4px
style U stroke:#1971c2,stroke-width:4px
style P stroke:#2f9e44,stroke-width:4px
style D stroke:#868e96,stroke-width:2px
style X stroke:#868e96,stroke-width:2px
An interest signature like:
"yoga • sustainability • minimalism • wellness • organic"
...tells you everything you need for personalisation without telling you anything about the person's identity.
It can live for a single session, or persist as an anonymous profile that the user can reset or export.
Early on, the simplest way to keep that profile stable without logins is first-party fingerprinting (the same idea as the signatures used in my bot detection work: /blog/botdetection-introduction). Later in the series we’ll switch to a first-party identity token issued by the store itself.
Crucially: persistence does not have to mean identity. The profile remains detached unless the user explicitly chooses to “unmask” it.
Compare that to traditional profiling:
Name: John Smith
Email: john@example.com
Age: 34
Location: Seattle
Purchase history: [284 items tracked forever]
Browsing history: [Cross-site tracking across 47 domains]
The first approach gives you better recommendations with zero PII. The second invades privacy and still gets it wrong (remember that one impulse purchase still haunting your feed six months later?).
You've experienced the dysfunction yourself:
What users experience:
What they don't get:
What the industry says:
These claims persist because they align with commercial incentives, not technical reality.
It's not that Google, Amazon, Meta, or TikTok can't explain how their recommendation systems work. They absolutely can. The algorithms aren't magic—they're math, statistics, and machine learning that could be explained in plain English.
They choose not to because transparency would undermine the economic assumptions these systems are built on.
The core issue isn't technical complexity—it's that these systems optimise for different outcomes than users assume.
Collecting signals to make a product more useful is not the problem. The problem is when those same signals are repurposed for targeting and behavioural manipulation.
Engagement maximisation often conflicts with user value. Ad placement drives what you see. Behavioural nudging keeps you scrolling.
Transparency would make these conflicts obvious, so opacity becomes a feature, not a limitation.
There are massive commercial and PR reasons to avoid transparency:
1. Data Collection Scope
2. Optimisation Targets
3. Profile Permanence
When pressed, Big Tech hides behind: "The algorithm is too complex for normal users to understand."
This framing is misleading. Users already understand:
They could understand recommendations too—if companies chose to explain them.
Complexity isn't the barrier. Exposure is.
Building a zero-PII customer intelligence system starts with one fundamental principle: users should understand what's happening and why.
You don't need a whitepaper. You need a plain-English mental model that users can internalise in thirty seconds.
Here's what not to say:
"A cluster derived from embeddings in a high-dimensional vector space, derived from similarity scores across vectors..."
Here's what works:
"We group products and interests into small, overlapping segments based on how people interact with them. You're probably in dozens of segments at once—and they change constantly based on what you actually do."
Three key concepts to communicate:
graph TB
User[Anonymous user] --> Action1[Views yoga mat]
Action1 --> Sig1[Signature updated]
Sig1 --> Action2[Views cookbook]
Action2 --> Sig2[Signature updated]
Sig2 --> Action3[Hides product]
Action3 --> Sig3[Signature updated]
Sig3 --> Sig4[Decays over time]
Sig4 --> Sig5[Fades without reinforcement]
Sig3 --> Segment1[Segment: wellness]
Sig4 --> Segment2[Segment: general]
Sig5 --> Segment3[Segment: cold start]
style Sig3 stroke:#1971c2,stroke-width:4px
style Segment1 stroke:#2f9e44,stroke-width:3px
style Segment2 stroke:#fab005,stroke-width:3px
style Segment3 stroke:#868e96,stroke-width:2px
This framing immediately differentiates your system from the creepy "you looked at this once, now we'll show it to you forever" behaviour users have come to expect. This is closer to how people actually behave than static "profiles" ever were.
Transparency means being specific about what actions influence segmentation—and for how long. A simple table does wonders here:
| Action | Signal Strength | Duration | Notes |
|---|---|---|---|
| Single click/view | Weak | Minutes–hours | Curiosity, not commitment |
| Multiple views over time | Medium | Days | Growing interest |
| Explicit "I'm interested" | Strong | Weeks | Clear signal |
| Save/bookmark | Strong | Weeks+ | Intentional signal |
| "Not relevant" / Hide | Suppression | Long | Respect the signal |
| No reinforcement | Decay | Varies | Interest fades naturally |
This table alone transforms the user experience from "mysterious algorithm" to "fair system I can influence."
This is the single biggest difference between segmentation and profiling.
Here's where zero-PII segmentation shines: interests fade unless reinforced.
"One late-night browse won't follow you for weeks. If you don't keep engaging with something, we assume you've moved on."
Traditional tracking systems build permanent profiles. Every action accumulates forever, creating an increasingly distorted picture of who you are.
A decay-based system is fundamentally different:
You don't need to explain the exponential decay function or half-life calculations. Users need reassurance, not mathematics.
Here's what radically differentiates this approach: customers can see and adjust their own interest signatures.
Imagine a simple interface that shows:
Your Current Interests (this session)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🌱 Sustainable Products ████████░░ 80%
🧘 Yoga & Wellness ███████░░░ 70%
📚 Minimalism █████░░░░░ 50%
🏃 Athletic Gear ████░░░░░░ 40%
🌿 Organic Foods ███░░░░░░░ 30%
[Remove] [Adjust] [Add Interest]
These fade over time unless you keep engaging.
Last updated: 2 minutes ago
This level of transparency gives users:
Compare this to traditional systems where you have no idea what profile they've built about you, no way to inspect it, and no control to adjust it.
Even with minimal implementation, you can offer capabilities that almost no ecommerce systems provide:
Basic Controls:
Advanced Controls (for later):
The key insight: These aren't just features—they're trust signals. They communicate: "This system responds to you. You're not being subjected to it."
Traditional systems keep algorithms opaque to avoid exposing the scope of data collection, behavioural inference, and how that information is monetised (targeting, nudges, attribution).
We can be radically transparent because there's nothing invasive to hide:
When users inspect their interest signature, they see clean semantic concepts: "sustainable products • yoga • minimalism" rather than demographic inferences or behavioural predictions.
This transparency isn't just ethical. It's a competitive advantage because you can say what competitors can't:
"Here's exactly how our recommendations work. Inspect it. Control it. Trust it."
When you can explain your algorithm openly, you can build features that targeting-driven systems cannot:
1. Real-Time Interest Dashboard
2. Explicit Controls
3. Recommendation Explanations
4. Algorithmic Auditing
5. Data Portability
Notice what Big Tech can't build without admitting their practices:
They're locked out of building trust features because transparency would expose the surveillance.
Once segmentation is clearly explained, you can layer features that compound trust:
You're not building features in isolation. You're creating a coherent system where each piece reinforces the mental model users already have—and can verify.
While this article focuses on the conceptual model, let me preview the technical stack we'll use in the implementation parts:
We'll use Qdrant for semantic segmentation. Instead of manually defining categories, we'll let the system discover natural groupings based on how products relate semantically.
For example, "yoga mat" and "meditation cushion" cluster together not because we tagged them—but because their embeddings are naturally similar. We covered the basics in Building CPU-Friendly Semantic Search, and we'll extend those patterns here.
Key insight: Vector databases let you compare interest patterns, not user profiles. No identifiers stored.
We'll use session cookies/tokens for short-term personalisation. When the session ends, the session state ends too.
Separately, we can maintain a persistent anonymous profile keyed by a stable anonymous key.
This profile is detached from identity by default, and only becomes identifiable if the user explicitly chooses to “unmask” it.
The problem is not that signals exist. If a store learns that you’re interested in running shoes and uses that to show you better running shoes, great.
The problem is when the same signals are repurposed for targeting, attribution, and behavioural manipulation — and when identity makes those signals portable across the web.
When you use third-party sign-in (for example “Sign in with Google”, “Sign in with Facebook”, etc.) you introduce a globally stable identifier.
That has a compounding effect:
In other words: your profile becomes more valuable precisely because the identifier works everywhere.
A Google account can span many high-signal products: Search, YouTube, Maps/Android, Chrome, and (for many people) Gmail.
Even if a given property has strict rules about what content is used for ad personalisation, the commercial value comes from the stitching:
That’s why “Sign in with Google” can be a step-change: it replaces a local, first-party identity with a portable identifier that is easy to correlate.
In this series, persistence is either:
And it only becomes identifiable if the user explicitly chooses to unmask it.
This keeps the useful part (better recommendations) while avoiding the commercial part (portable profiles optimised for targeting).
sequenceDiagram
participant Browser
participant Server
participant Session as Session Store
participant Profile as Anonymous Profile Store
participant Qdrant as Vector DB
Browser->>Server: View product (yoga mat)
Server->>Session: Get/Create session token
Session-->>Server: Anonymous session ID
Server->>Qdrant: Find similar products
Qdrant-->>Server: Related items
Server->>Session: Update session signature
Note over Session: Session signature updated (yoga, wellness)
Server->>Profile: Merge into anonymous profile (optional)
Note over Profile: Anonymous profile merged (fingerprint/first-party token)
Server-->>Browser: Recommendations
Note over Browser,Session: 30 minutes later...
Browser->>Server: View another product
Server->>Session: Get session
Session-->>Server: Apply decay to signals
Note over Session: Strength reduced by time elapsed
Note over Browser,Session: Session expires (24 hours)
Session->>Session: Delete session state
Note over Session: Session state is ephemeral.
Note over Profile: Anonymous profile may persist (still detached unless unmasked)
The session stores:
Separately, an anonymous profile (if enabled) stores:
No PII by default. Persistence is anonymous and revocable, unless explicitly unmasked.
We've covered using DuckDB with local LLMs before. Here we'll extend that pattern for privacy-preserving analytics.
Instead of tracking individuals:
-- What we DON'T do
SELECT user_id, product_views FROM analytics WHERE user_id = 'john@example.com'
-- What we DO do
SELECT segment_id, COUNT(*) as users, AVG(engagement_score)
FROM interactions
GROUP BY segment_id
You see "people interested in sustainable products also engage with wellness content"—not "John Smith looked at these specific items."
This builds on concepts from DataSummarizer, where we generate insights from aggregate statistics, never raw individual data.
We'll implement exponential decay so interests naturally fade without reinforcement. This is illustrative, not prescriptive:
public class DecayingSignal
{
public double Strength { get; set; }
public DateTime LastUpdate { get; set; }
public double HalfLife { get; set; } = TimeSpan.FromDays(7).TotalSeconds;
public double GetCurrentStrength()
{
var elapsed = (DateTime.UtcNow - LastUpdate).TotalSeconds;
return Strength * Math.Exp(-elapsed * Math.Log(2) / HalfLife);
}
}
A single view at 2 AM doesn't define you forever. After a week with no reinforcement, that signal has halved. After two weeks, it's down to 25%.
This is where the series gets concrete: in Part 2 we’ll build the session-scoped “interest signature” and the logic that maps it onto a small set of dynamic segments.
The important conceptual point for Part 1 is simply this:
The beauty of this approach is that you get sophisticated personalisation without ever knowing who anyone is. The specific tools matter less than the model.
Your documentation should be clear and concise:
"Our recommendations aren't a black box. They're built from lightweight segments that respond to what you do—and just as importantly, forget what you don't reinforce."
This sets the right expectations:
What you're really building are process-first systems.
These must be explained in terms of:
Not static snapshots. Not fixed categories. Not permanent profiles.
Documentation isn't an afterthought here—it's part of the product. The mental model you give users is as important as the algorithm itself.
This series will implement a complete proof-of-concept ecommerce system demonstrating these principles:
Each part will include working code in C#/.NET, deployable Docker configurations, and real performance metrics.
Building transparent, zero-PII intelligence isn't just ethically better. It's technically better.
Traditional profiling systems accumulate noise over time:
Decay-based, session-scoped systems are more accurate because they focus on current intent, not historical accumulation.
graph LR
subgraph Traditional["Traditional: Profile Accumulation"]
T1[View product] --> T2[Add to profile]
T2 --> T3[Stored forever]
T3 --> T4[Never decays]
T4 --> T5[Stale recommendations]
end
subgraph ZeroPII["Zero-PII: Signal Decay"]
Z1[View product] --> Z2[Signal: strength 1.0]
Z2 --> Z3[7 days: strength 0.5]
Z3 --> Z4[14 days: strength 0.25]
Z4 --> Z5[Fades without reinforcement]
Z2 --> Z6[Reinforced signal]
Z6 --> Z7[Current recommendations]
end
style T3 stroke:#c92a2a,stroke-width:4px
style T5 stroke:#c92a2a,stroke-width:3px
style Z2 stroke:#1971c2,stroke-width:4px
style Z6 stroke:#1971c2,stroke-width:4px
style Z7 stroke:#1971c2,stroke-width:4px
Users who understand and control the system engage more:
Opacity breeds distrust. Transparency builds loyalty.
You're not selling user data or running invasive ad networks:
Plus: you can open-source the algorithm without giving away competitive advantage, because the value is in the experience, not the surveillance.
We're going to prove you can build semantic and statistical customer intelligence that is:
If we succeed, we'll have demolished the false dichotomy that says "good personalisation requires invasive tracking."
The belief that sophisticated personalisation requires data harvesting and algorithmic opacity persists because it aligns with existing business models—not because it's technically necessary.
Over this series, we'll build a working ecommerce system that demonstrates:
When users understand the system, they work with it. When they can work with it, they trust it. And when they trust it, they engage with it.
When personalisation is built from process instead of identity, privacy stops being a constraint—it becomes a property of the system.
Next: [Part 2 - Core Implementation] where we build the session-based tracking, vector embeddings, and decay functions with working C# code.
This series combines concepts from Semantic Search with ONNX and Qdrant and DataSummarizer into a complete zero-PII ecommerce intelligence system.
© 2025 Scott Galloway — Unlicense — All content and source code on this site is free to use, copy, modify, and sell.