BLOG

Call my agent

Before AI agents, there were user agents. To get personalized services, have their agent call your agent.

Scroll to read
9/15/2024

To deliver personalized services, services will call your user agent.

We've all heard this narrative. Why would they need to do this?

So far in internet history, apps have collected context about us and used that context to serve us.

They identified us with cookies or other identifiers and associated all the context they observed to that ID in what eventually became a CDP.

And indeed, CDPs like Twilio’s Segment make activating that data easy and startups like mem0 and zep enable companies to ship their own memory layers for their own AI applications.

So where does “your agent” come in? And why?

If you're using an AI service with a memory layer, do you need an agent? Stepping back, how do you even get an agent? And once you have one will it always have your best interests at heart?

May I speak to Susie Myerson please?

AI agents are evocative, but it’s hard to know what they really refer to.

We’ve seen many candidate definitions across the industry with e.g.,

LangChain’s Harrison Chase saying

An agent is a system that uses an LLM to decide the control flow of an application.

or Andrew Ng

There’s a gray zone between what clearly is not an agent (prompting a model once) and what clearly is (say, an autonomous agent that, given high-level instructions, plans, uses tools, and carries out multiple, iterative steps of processing).

But, per OpenAI’s Noam Brown, OpenAI’s new o1 is a model, not a system.

You can just prompt the model once (in contrast to Ng) and the model will "think" before providing an answer. OpenAI explains (and nicely explained by Letitia)

Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. This process dramatically improves the model’s ability to reason.

This reminds us of BabyAGI from last year but strictly speaking, Noam’s definition is in apparent contrast to Chase and Ng’s definitions, which seem to require an “agent” is a language model in a loop.

So which is it?

Maybe … it doesn’t matter.

AI models are on a path to incredible capabilities.  Instead of spending time debating the technical details of what makes an agent, perhaps we should spend more time concerned with how AI works for us.

A real user agent

In 1998 Tim Berners Lee defined the User Agent in human terms (in reference to the more formal 1996 definition)

Browsers and Email programs are user agents. They are programs which act on behalf of, and represent, the user.

15 years later he submitted a refinement

When does software really act as an agent of the user? One way to think about this is that the program does, in each instance, exactly what the user would want it to do if asked specifically.

This comes in contrast to the common form posited by practitioners in AI e.g., Bill Gates

This type of software—something that responds to natural language and can accomplish many different tasks based on its knowledge of the user—is called an agent.

Tim Berners Lee definition appear to align more strongly with the psychologist form of agentic programs that

describe humans’ capacity to exercise control over their lives

Casting agents as those that do exactly what humans want seems like a more fertile path than describing AI technical capability, which could readily do all Gates describes yet violate the definition preferred by Berners Lee.

What an agent is has less to do with how a technology works, and everything to do with how a technology aligns itself to the wishes of those it works on behalf of.

Agent defined

Let’s follow Tim Berners Lee definition:

a user agent is a program that acts on behalf of, and represents, the user.

This program could be anything:

  • a python script
  • a browser
  • an AI model like claude-3.5-sonnet
  • chain of thought gpt-o1
  • or gpt-o1 in a loop.

An AI agent then, depending on its alignment with the definition above, could be just an agent. Whenever we refer to AI Agent, we do so under Berners Lee definition, just where the program happens to be AI-powered but does exactly what the user wants.

It may well be that your user agent is managed by a third party. For instance, browsers are readily user agents, and browsers are developed and distributed by third parties.

Private AI labs appear to endeavor to be user agents. So far, however, their focus on alignment has been on a general form, rather than alignment to individual users as University of New Mexico Prof. Geoffrey Miller put it. [A competitive market and low switching costs is a meta way to deliver user alignment.]

Perhaps obviously, there exist legitimate user agents that are not aligned to you. Cleary, another user’s user agent acts on behalf of that user, not you. Your sovereignty and user agent alignment does not trump that of another. The same could readily be said of a business.

This raises a problem for agents.

My context, my rules

Businesses have built services accessible only via their application interfaces, not APIs.

  • You can book an Uber on Uber.  
  • You can book a trip on Airbnb or Expedia.  
  • You can buy groceries at Walmart

We refer to these types of apps as Thick apps: those that have large inventories with variable and expensive to observe attributes. Retail, travel and real estate all satisfy this condition. These inventories are expensive to originate and maintain and these apps are opinionated about customer experience: how their inventories are maintained and made available to customers. Intermediated service provision is hard to make work.

An agent with AI then has two candidate interfaces into a thick app: authorized and unauthorized routes.  

So far in internet history, we’ve engaged with apps through authorized routes. We send HTTP requests public-facing web servers respond to, and load the web application in our browser. How much of that web experience is constructed server-side v client-side (on our turf) is an evolving question (and an active area we’re exploring for its privacy implications).

An unauthorized route might be sending a service to scrape all available data from a business and pull it into an environment you own.  We’ve been suspicious of those routes – they’re expensive, slow, not always reliable and can violate the agency and sovereignty of others. Clearly your agent doesn't have right to another person's context. You don’t have a carte blanche right to my context.  Either way, Cloudflare is working on helping apps block this.

Users and apps can share context directly or through scraping. Neither is optimal.

While apps could expose an Agent API – services that make internal context legible to third parties e.g., as  Jeremiah Owyang proposes – our bet is that isn’t how things play out.  Entities shouldn’t have to reveal all internal context – which may readily have value in personalization or alignment efforts – in order to collaborate. The trust implications alone could raise concerns.

Instead, we’re bullish on controllable interfaces that

  • allow users to project outside context into app experiences (as in last week’s YMMVC)
  • allow apps to project app-owned services into other contexts

e.g., as in

Apple App Intents make app functionality legible to iOS and Apple Intelligence.
Uber Deep Links allow third parties to start the Uber Ride Request Flow from anywhere.

These are real (and, in our view, elegant) commercial applications that enable app and context interoperability that don’t require uncontrolled exposure of private information or services.

Apps aren’t about to (and shouldn’t have to!) give up core functionality or brand equity to third party agents.

Let’s put this in context for AI agents.

AI Agents in action

Putting this in context for AI, we follow Bill Gates

Imagine that you want to plan a trip. A travel bot will identify hotels that fit your budget. An agent will know what time of year you’ll be traveling and, based on its knowledge about whether you always try a new destination or like to return to the same place repeatedly, it will be able to suggest locations. When asked, it will recommend things to do based on your interests and propensity for adventure, and it will book reservations at the types of restaurants you would enjoy.

So how will this work exactly?

Bill Gates is obviously not talking about a world governed by first party data – however managed by CDP or memory layer for AI. He’s talking about services that can safely integrate all our context.

Clearly we can't copy over our context to every app that asks. And we can’t ask apps to copy over all their context.

So what’s the path?

We’re betting on the YMMVC pattern from last week's blog: users allow an app to project AI-transformations of their private linked context onto the app for an agreed purpose.

Such a pattern is consistent with App Intents and Deep Links and also how we engage IRL.

When a florist asks “what’s the occasion?” your response is not “you don’t need to know that talk to my agent.” We say, “I’m buying flowers for my Mom’s birthday and she loves asiatic lilies.” Context flow is itself all contextual.

Is your AI agent allegiant?

Returning to the Berners Lee definition of agent

a user agent is a program that acts on behalf of, and represents, the user.

how do we know if an AI service with access to our context (e.g., Crosshatch) will act on our behalf (e.g., only satisfy requests corresponding to an agreed purpose) at the behest of another application?

Luckily, this is a canonical principal agent problem!

In classical principal-agent language,

  • a user “the principal”
  • wants an app “the agent”

to personalize their e.g., ‘travel’ user experience to the ends of a superior travel UX and not for another purpose.

The YMMVC Principal-Agent problem.

Of course, the agent could have other motives. So the question is, how do you construct a mechanism such that both the principal and the agent are better off?

Following Google economist Hal Varian's Monitoring Agents with Other Agents

The principal-agent literature typically assumes that the principal is unable to observe the characteristics or the actions of the agents whom they monitor. However, in reality, it is often not the case that agents' characteristics or effort levels are really unobservable; rather, they simply may be very costly to observe.

In particular, simply because information is costly to the principal doesn't mean that it is costly to everyone.

Varian studies the multiple agency problem wherein a principal can deploy an agent to monitor the performance of other agents. The monitor really has two available strategies for inducing desired behavior by the performing agent

  • The carrot: make good behavior less expensive
  • The Stick: make bad behavior more expensive

In the paper, he shows that

a principal prefers a monitor who can reduce the costs of desirable actions rather than increase the cost of undesirable actions.

This is what the YMMVC model does. With Crosshatch, users link their context to apps for a specified purpose or “usage policy.” Since the principal is free to use any monitor she wishes, the threat of competition can induce monitor compliance. That said, Varian does not treat the issue of monitor alignment in this paper.

In the YMMVC framework, Your Model can observe requests made by the application agent, and check to make sure that requests comply with the agreed purpose. Your Model agent could even flag or block third party agent requests that appear to violate your usage policy – this is something Azure OpenAI already does out of the box.

This pattern corresponds with the Varian framework wherein your Crosshatch agent performs as a monitoring agent for third parties that perform personalizing action on your (the “principal’s”) behalf.

[To see this in action – check out our docs! Public beta launch coming soon!]

Where does your agent live?

It'd be nice if your agent lived locally: All your data and your agent in a device you control.

The trouble is that the AI services you might wish to equip your agent with likely do not fit on your device. For instance, Apple extends the capability of Apple Intelligence with Private Cloud Compute, AI services that live in Apple's cloud, not locally. OpenAI services are available in Azure while Claude and Gemini services are available in GCP.

Further, there's value in having your data available on any device. That's why Apple shipped iCloud and CloudKit, per Apple's Nick Gillett at WWDC19

A belief that I have is that all of my data should be available to me on whatever device I have wherever I am in the world. …

And the data that we create on all of these devices is naturally trapped.

There’s no easy way to move it from one device to another without some kind of user interaction.

Now to solve this, we typically want to turn to cloud storage because it offers us the promise of moving data that’s on one device seamlessly and transparently to all the other devices we own.

And 2 in 3 iPhone users use iCloud – the promise and ease of data availability appears to resonate with most iPhone users. Apple keeps its Standard Data Protection – where it stores most encryption keys in the cloud for easier data recovery – as the default option. iCloud users may choose to "Optimize iPhone Storage" for Photos, keeping full-resolution photos in the cloud with smaller device-size versions on device.

No alt supplied for Image

The story of "local only" AI doesn't match its commercial implementations – with data, encryption keys, and AI – all living in the cloud.

More likely appears to be Gavin Baker's "local when you can, cloud when you must." Standard calls to Siri may not need cloud resources, but queries involving richer context may best be routed to the cloud, closer to where AI and data resources live.

We see user agents as Berners Lee appears to: a program

  • Any AI: today: claude, gemini, gpt models
  • With access to permissioned context

able to be directed by choice application agents exactly on your behalf.

The north star is maximal interoperability – AI and context – securely, projected on any surface you want.

See what Crosshatch can do for your business.

Crosshatch for businesses

Collecting our thoughts

experience
an internet
Made for you
start building