In April 2019 Crosshatch cofounder Soren Larson decided to join Walmart’s Store No 8, a startup division created by Jet.com’s Marc Lore. The team’s mission was to help busy families save time by putting groceries and household goods on autopilot. Our team was well positioned to this
- Team: I joined a team of an ex-Renaissance Technologies Harvard astrophysicist and a Cambridge experimental physicist
- Data: Walmart has more first party data than anyone on the needs of American families
- Infra: Walmart data is well organized, with purchases and clickstream data all readily accessible in data warehouses
Walmart instrumented all events on web and mobile – we could see everything a user did on digital Walmart properties. This data landed in data warehouses including Google BigQuery.
It was hard to imagine that we wouldn’t succeed.
Yet our journey at Walmart revealed a fundamental truth that would eventually lead us to found Crosshatch. After experiencing first-hand the limitations of even the most robust first-party data infrastructure at one of the world's largest retailers, we recognized that the entire industry was building on a flawed foundation. What began as a frustrating experiment in predictive grocery auto-replenishment became the motivation for reimagining how personalization could actually work in practice.
The gap between the promise of data-driven personalization and its reality wasn't just Walmart's challenge—it was an industry-wide blind spot.
What is first-party data?
First party data is any information that a business collects as an end-user engages with a product. This information includes
- Products a user clicks on
- Forms a user completes
- Products a user adds to cart
- Products a user purchases
- Searches a user enters
Walmart consolidates all of this information within its data warehouses.
Whether this information is
- collected in a CDP
- or a data warehouse
- or a memory layer for AI
all of this information constitutes first-party data.
But as we learned in our product development, first-party data is insufficient for personalization that saves users time.
Why first-party data isn’t enough for personalization
At Walmart, we endeavored to use the data we collected about customer behaviors to ship the ultimate personalized service: a service that puts groceries on autopilot. This is what personalization long promised: A service that knows what we like, our needs, or behaviors, and can automate chores in our life.
Using first-party data collected on Walmart.com, we planned to estimate
- How often customers needed groceries
- What groceries they would need
In our product development, we onboarded 1,000 customers to have their groceries and household goods delivered on autopilot. Using our vast first party data we
- estimated how often users would expect groceries
- what groceries the user would need
Whenever it was time for a new order, we placed orders on behalf of customers, giving users 2 days to modify their carts.
What we found is that we were only able to predict about half the basket with 70% precision. Namely, for only half the basket could we estimate 7 in 10 items correctly. For the other half of the basket it was even worse.
Walmart endeavors to save busy families time. This level of performance wasn’t good enough – shoppers don’t want AI to buy groceries on our behalf where 3 or 4 of the products ordered are wrong. That doesn’t save users time.
The problem we faced wasn’t
- Data availability: we could see all online customer behaviors
- Talent: our team was lead by an ex-quant from a fund that delivered top hedge fund returns
- Infra: We were able to train models on all customer data on leading cloud infrastructure
It was one of data. First-party data isn’t enough to anticipate customer needs.
First-party data fails in the AI era
When you zoom out to look at what data is actually in your data warehouse, you’ll quickly see why first-party data fails in the AI era.
CDPs collect data like what things users
- click on
- Search for
- Scroll past
- Buy
Most of these things are not very rich in signal. Does what someone clicks on actually mean they like it? Or what they search for? Or scroll past? How does this first party data actually drive real modeling of user preferences? These behaviors might confer value in scaled platforms like TikTok, but most sites don’t have this sort of traffic or products that really imply rich preferences.
Purchases are rich in signal, but purchases are backward looking and are limited to what a user does with only your app. Customers use many apps! So any behaviors you might observe over time – that could inform you to their preferences – could mislead you into what users actually need.
Do they buy milk once every other week actually? Or do they switch between your app and another? Or were they just traveling last month?
If a user says they like pizza, do they always like pizza, or just certain contexts?
When you start looking at the data actually in your data warehouse – the things people
- click on
- Search for
- Infrequently bought (conversion events are rare!)
- Say they like
you might start to wonder how that data actually confers data activations that drive value to you or your customer.
This is particularly clear in this new age of AI, where models like Gemini or GPT4 can be readily asked what data in your warehouse might mean for an optimal next best interaction with your customer.
AI puts the limitations of first-party data in clear view. You can just ask ChatGPT to interpret your first party data, but what’s ChatGPT going to do knowing what a user recently clicked on, searched for, or bought once last summer? Not much.
Why AI demands a new kind of context
While first-party data is useful to know where customers are in their customer journey in a session, it’s fundamentally insufficient to ship personalized experiences that move the needle for customers.
No data activation platform will solve this. While every data activation platform markets shipping 1:1 customer experiences, fundamentally this is a lie. They can’t actually deliver personalization that saves users time because first party data just isn’t enough context for AI to provide value to customers.
To make matters worse, first-party data infrastructure is expensive to stand up, administer, and maintain. VCs will say that first-party data is what drives competitive advantage, but when you actually look at the data you’ve collected, you might start to wonder how this narrative went on for so long. How exactly is knowing what a user clicked on a few months ago driving competitive advantage?
Rather than standing up another data activation platform, teams need a personalization platform that runs on a customer’s complete context. This offers product managers and marketers a true complete view of the customer’s journey from across applications, and enables time-saving personalization that adapts to customer needs and drives loyalty and spend.
This is a controversial view, but we believe it’s time to move on from incremental improvements in personalization. We’ve been told first-party data is the pathway to better personalization, but it’s been well over a decade since these narratives began, and we’re no closer to the futuristic personalization we were long promised.
Instead of relying solely on first-party data, companies need to leverage enriched customer contexts from multiple sources to deliver truly effective personalization. At Crosshatch, we've built a new kind of personalization platform that combines first-party data with context users bring with them to create a complete view of your customers. By contextualizing behaviors across applications and channels, we enable businesses to anticipate customer needs with much higher accuracy than traditional CDPs or first-party data warehouses alone. Our early customers are delivering adaptive experiences that take action on changes in a user's context, previously impossible on the basis of first-party data alone.
It's time to move beyond the limitations of first-party data and embrace a new paradigm for personalization. At Crosshatch, we're helping forward-thinking brands deliver experiences that truly adapt to customer needs and save them time. Want to see how our approach compares to your current personalization strategy? Schedule a demo to discover what's possible when you stop relying solely on first-party data.