BLOG

Personalizationomenon

Personalization today is expensive and bad. It's not a skill issue: no one has the context. To get personalization from the movies, we need a radical shift.

Scroll to read
8/2/2024

In 1994 sites couldn’t remember us across sessions. The New York Times recounted in 2001

"At that moment in Web history, every visit to a site was like the first, with no automatic way to record that a visitor had dropped by before. Any commercial transaction would have to be handled from start to finish in one visit, and visitors would have to work their way through the same clicks again and again; it was like visiting a store where the shopkeeper had amnesia."

Lou Montulli at Netscape introduced the cookie, and we ended up tracked everywhere we went by organizations we’ve never met.

The Golden Rule of Cookies didn’t last long.

But now AI poses incredible capability to personalize anything – just add some context and a prompt.

But (AI-powered) applications don't remember us across applications (even those using the same underlying AI services!). Despite all its capability, we’ve found AI with the same type of amnesia the internet had in 1994.

Every time we encounter a new application we must re-introduce ourselves. Just as we did pre-cookie, we're expected to act as a “human clipboard” and use our own labor to go through the same onboarding flows, share all the requested information (with Terms of Service that no one (should have to!) read) so that a new digital service can wield estimations of who we probably are based on limited fragments they could get their hands on – consented or not.

The largest and best situated players tell us this is For Our Own Good – for The Sake of Our Privacy – while they in turn launch and grow massive ads businesses for their own benefit.

In this blog we’ll contrast how personalization systems have historically been built, and how, perhaps paradoxically, AI can unlock 10x better personalization that’s cheaper to implement and actually more private.

I wish I knew how to quit you

We all know identity on the internet doesn't really work.

And not to get pedantic (read: please pardon this pedantic pageantry) but it seems like defining what personalization actually is could help us speed run to a solution.

So many – particularly in web3 and EU policymakers – get lost here.

For us, personalization is about identity. It’s about aligning the world to us, and perhaps finding alignment in ourselves with the world.

It feels like personalization can actually go pretty deep e.g., following Oxford philosophy professor Timothy Williamson

“Knowledge and action are the central relations between mind and world. In action, world is adapted to mind. In knowledge, mind is adapted to world. When world is maladapted to mind, there is a residue of desire. When mind is maladapted to world, there is a residue of belief. Desire aspires to action; belief aspires to knowledge. The point of desire is action; the point of belief is knowledge.”

Personalization appears to be a flexing of

  • world to mind (desire to act to project ourselves onto the world)
  • and mind to world (belief to knowledge to align what we believe to what is true about the world)

[Given recent FTC action on “surveillance pricing”, we say for posterity that this flexing need not imply personalized prices!]

Personalization as a dual relation of mind to world is obviously about identity. We suppose it is comprised of three components:

  1. Identity
  2. Context
  3. Control

The internet really still has none of these.

I am who I think you think I am

Personalization starts with identity.

Applications today cobble together identity from third party cookies (t̶h̶a̶t̶ a̶r̶e̶ g̶o̶i̶n̶g̶ a̶w̶a̶y̶ are probably still going away) and increasingly logins.

Stopgap “id bridging” techniques soon won’t work on 4 in 5 browser sessions with Safari “Private Relay” and forthcoming Chrome IP Protection (that’s been in the works for a while!).

Apple’s iCloud Private Relay hides IP address of browsing activity on Safari, limiting effectiveness of ID-bridging strategies.

Customer Data Platforms "CDPs" are worthless without consistent identity.

B2C companies responded this spring by adding log in and loyalty CTAs toward a new Logged In Web.

B2C companies accrue little advantage from usage for users who don’t convert and whose identity they cannot track. AI-motivated scraping has exacerbated this issue, with anonymous users now bombarding brands with valuable data with heavy network loads and extreme bandwidth charges while providing no commensurate value in return.

My reality is just different from yours

Next for personalization, you need context.

First party context isn't that much, so many have augmented it with that provided by “Data Management Platforms” “DMPs”.

It’s not immediately clear what Data Management Platforms are. Wouldn’t they just be Customer Data Platforms?

Actually – no.

It turns out that, while they represent similar data, DMPs contain third party data (like that provided by an Epsilon or an Acxiom) while CDPs contain first party data. Enterprise CDPs often sell subscriptions to DMPs that enrich CDPs with third party data from DMPs.

Where do DMPs even get their data?

Some brag they have data on over 200mm Americans – including transactions, clickstream, and even health data. They have sold the legitimacy of their data collection claiming their records are anonymized or privacy preserving ostensibly because they’ve removed PII (even as brands use related identifiers to match to their own first party data).

Either way, the FTC isn’t buying it.

Rightfully so – removing PII leaves plenty of avenues for reidentification that we’ve collectively known about for at least a decade.

Some in the industry believe DMPs will be regulated away, even though activations of this data constitute core components of adtech e.g., Trade Desk margin, per Ari Paparo.

Yeah, ooh, control

Finally, personalization needs some control layer. It needs some control layer to mirror how we engage in real life. For instance, we're different when we're with our friends than we are at work than when we're with kids, than when we're out for a run.

And no one needs to say – we both

  • don’t have any control on the internet
  • and yet we also have too much (Thanks EU!)

which practically leaves us with none at all.

There’s all sorts of deeper writing on why we should care about control on principle. The most straightforward of which is J̶a̶n̶e̶t̶ J̶a̶c̶k̶s̶o̶n̶ John Stuart Mill

“Society can and does execute its own mandates: and if it issues wrong things with which it ought not to meddle, it practices a social tyranny more formidable than many kinds of political oppression, since, though not usually upheld by such extreme penalties, it leaves fewer means of escape, penetrating much more deeply into the details of life, and enslaving the soul itself. Protection, therefore, against the tyranny of the magistrate is not enough: there needs protection also against the tyranny of the prevailing opinion and feeling; against the tendency of society to impose, by other means than civil penalties, its own ideas and practices as rules of conduct on those who dissent from them; to fetter the development, and, if possible, prevent the formation, of any individuality not in harmony with its ways, and compels all characters to fashion themselves upon the model of its own.”

We're most compelled by control as exercise of social identity: the way we change ourselves depending on the context. What's for sure is that on the internet we definitely can't do that today.

In industry we've all (at least implicitly) recognized these challenges, and built bloated bandaid infrastructure workarounds that

Building personalization systems is expensive, and they’re not even that good.

Not second best but very, very expensive

Building personalization and consumer data activation pipelines is an expensive proposition.

It requires

  • Instrumenting and assembling consistent events data into one place
  • Integrating this data into complex personalization systems powered by expensive data warehouses and headcount
  • Train, deploy and maintain models
Workflow for AWS Personalize

For the business, these workstreams are entirely speculative. It is a bet of magical thinking that – IF

  • a business just collects enough data
  • and they're able to build and deploy algorithms
  • placed in just the right places

that will eventually lead to improvements to important KPIs like LTV, AOV, NPS, or retention or engagement. These exercises thrived in ZIRP but businesses are taking a hard look at these today.

And to top it off, these exercises require coordination from

  • Product
  • Marketing
  • Eng
  • Data Science

who, in our experience, are stakeholders who don’t always enjoy working together.

Product and Engineering prefer longer time horizon projects: those that create new products. Marketing and Data Science are often interested in optimizations over products or even step on toes with products themselves (e.g., Zillow Zestimate or Retail Media businesses). Who owns these efforts can feel confusing, as many stakeholders could appear as rightful owners, while the whole involves a markedly diverse set of skillsets that few have (or have the opportunity to develop).

Aside from being expensive and a slog to build, our core beef with this style of personalization is that it’s not even that good. CMOs are increasingly saying first party data is “meh.”

The story of real-time personalization “if you just know your customer” based solely on first-party data – has wholly failed to materialize.

We’ve heard it since at least 2018, and we’ve yet to see it.

We even attempted it ourselves at Walmart. Walmart likely has one of the best first party data assets in the world. There's a store within 10 miles of 90% of all Americans and it's a consumer staples business selling groceries – people have a reason to shop there often. And unlike many assume, Walmart data is pristine – we’ve seen it ourselves. Perfect records of every order flowing into a warehouse with well-organized catalog ontology.

We attempted this in the simplest way possible – starting with grocery, where we knew people already shop pretty regularly. With this incredible data asset and a former quant from the dominant Renaissance Technologies leading the initiative, what could go wrong? 

What we found is that people are just idiosyncratic, even within a staples business like grocery. Their schedules and needs are changing. We couldn't predict even half of the basket with acceptable precision or recall.

Walmart's standard is saving busy families time. Personalization that delighted or saved time wasn't possible even with Walmart's incredible set up. Expanding scope to consumer discretionary like travel, retail or real estate, personalization seems impossibly hard without a fundamental change in dynamics.

We exist in the context  

Today’s failure of personalization is not an infra or skill issue. There’s great infra for personalization today.

It’s one of access to context.

We spin our wheels (and compute!) to just try to figure out who someone is based on limited information when we could just … ask. People are willing to share data for value in return.

And with AI’s incredible capabilities becoming more powerful, accessible, and affordable – the limiting constraint of context becomes even more pronounced.

We need a personalizationomenon. Headless personalization. Instead of expensive and speculative personalization efforts, what if instead we could mash different components

  • First party context or application state
  • Context users bring with them
  • Some instructions for what to do with it by user and developer

to get an AI produced interface, recommendation, or re-ranking?

What’s interesting about this framework (one that intermixes privacy and collaboration) is that this mashup of contexts

  • User-owned
  • Business-observed

can happen in an environment of a trusted third party facilitator. It’s like a consumer- and AI-powered clean room.

We think this personalizationomenon is far superior to clean rooms.

  • They have more data
  • Enable cheaper data activation and integration
  • Unlock more expressive application than just analytics or cohort-based targeting
  • Leave consumers in control  

Traditional clean rooms, on the other hand, require complex partnership-lead integrations that are really only available to the largest players. If you don’t have a lot of traffic, no one is going to want to create a clean room with you.

And for all the discussions of privacy, clean rooms don’t really have consumer stakeholders.  They do in theory – if something bad were to happen and people were to find out. But consumers aren’t usually consulted as a part of clean rooms or data consortia. Nor would they really want to be.

The kind of notice consumers get in the EU for being part of a clean room data consortia, even among premium publishers. See it for yourself at BBC’s Good Food.

Clean rooms are like the new third party cookie, but worse. They're just weird. Users can't delete them. Privacy is principal in their messaging, but clean room companies sell to businesses. Consumers are not stakeholders. Businesses only want privacy to the extent they can say they Value Our Privacy and use a Clean Room and Clean Rooms are Privacy Preserving.

Data collaboration with consumers, on the other hand, just makes sense.

Let consumers bring their data with them and activate it on any surface, for any purpose, with the help of a trusted AI. 'Rationalist privacy,' share a bit of context for commensurate value in return is the governing paradigm of privacy for most Americans.

This is what we’re building at Crosshatch. We make it straightforward for businesses to add a Link to their app and invite consumers to log in with their context for usage within a particular use case. We then stream this live consumer data to a consumer-managed wallet.

Once consumers 'ok' Crosshatch in a tap, businesses can then request short-lived authentication tokens that allow them to call Crosshatch-secured language models. Using these tokens, in a single API call they can get personalized output from favorite language models by adding context to language model prompts programmatically and in environments whose rules are known at the outset. This allows businesses to generate personalized outputs like interfaces, rankings, or recommendations, all without building any infrastructure or directly accessing or storing the raw user data themselves.

And it uses AI in exactly the thing it's good at; following Slow's Sam Lessin.

Image
Large Language Models are good at Expansion, Compression and Translation.

These are all of the ingredients of personalization: expanding, compressing and translating context from one instance

you liked a photo in Madison Square Park and you purchased at Sant Ambroeus

to another

you might like the Edition Hotel.

Comparing this to how personalization is currently done, feels like fresh air.

Personalizationomenon is a shift in how we approach personalization.  <claude> Instead of businesses struggling to cobble together limited data and build expensive, siloed systems, this new paradigm enables a seamless flow of permissioned data, activated by AI at the moment it's needed.

For consumers, this means personalized experiences that remember preferences across platforms, without compromising on privacy or control. For businesses, it means more effective personalization at a fraction of the cost and complexity. And for the internet as a whole, it means a more open, user-centric ecosystem, where innovation can flourish without being stifled by walled gardens. </claude>

We were promised flying cars and instead we got 140 characters. We want a hyper-personalized internet: one that rolls out the red carpet for us. We were promised fanciful new technology.

What we really need is a personalizationomenon.

See what Crosshatch can do for your business.

Crosshatch for businesses

Collecting our thoughts

CREATE UNIQUE
EXPERIENCES
FOR EVERY USER
start building