I ran a webinar on this recently and had more to say than the time allowed, so this is the written version: the argument I was making, some context on the demo, and the questions that came up from people watching. The recording is below if you'd rather watch than read.
The thesis: AI products are being let down by the user experience, not the model.
Over the past nine months, I've spoken to engineering and product teams at more than 40 companies across a range of industries, all of whom are shipping AI agents, assistants, and copilots to real users at scale. They're all running into the same problems, despite having little else in common, and the root cause isn't model capability because that is the thing that varies most. What they share is the delivery layer between the agent and the client.
Key takeaways #
- Most AI UX failures are delivery failures, not model failures. The transport layer is where sessions break, context is lost, and user trust erodes.
- The fix is to stop treating the connection as the session. A durable session is a persistent, shared resource that sits between agent and client, so it survives the connection drops, device switches, and agent crashes that occur in the real world.
- Moving the session off the connection is a transport-layer swap, not a rewrite, which means you can try it against an agent you've already built. The agent code, model integration, and prompt harness stay the same.
- Once the session lives in the right place, multiple features fall out of the same architecture: multi-tab and multi-device sync, bidirectional agent control, human-AI handoffs with full context, and concurrent subagent coordination.
What users are experiencing #
As a user of AI applications, I’m sure you’ve seen some or all of the failure modes listed here. Niggles that were tolerable when AI experiences were novel are becoming infuriating because they crop up in the products and services we use every day.
Dropped sessions. Something breaks the connection between the user and the agent, leaving the user with no response and sometimes not even the prompt they typed. Connection drops can come from everywhere: network changes, a laptop going to sleep, or corporate proxies closing idle streaming connections. None of these are unusual, which is the point: if you want a product that feels reliable, you can't treat the routine case as an edge case.
Silent agent failure. Something goes wrong on the agent side, and the user has no feedback signal. Do they wait? Hit cancel? Start again? A user who can’t tell whether the product is working will assume that it’s broken.
Context loss at handoff. The chatbot reaches the limit of what it can handle and passes the conversation to a human support agent. The human has no context, so the user has to repeat the whole thing. In a support context, where the user has usually run out of patience by the time they reach a human at all, this is one of the fastest ways to lose their trust.
Device lock-in. People expect to start something on their phone and finish it on their laptop, or have a tab they left open reflect what they just did somewhere else. This is how every other cloud product works, so it’s a jarring gap when an AI product doesn't behave the same way.
No control during generation. Some teams have disabled user input during model generation entirely because handling interruptions was too hard. The user is left watching the agent go off track, burning tokens to produce a wrong answer, with no way to stop it.
The architecture behind the failures #
In most AI products today, the agent and client are connected by a direct pipe such as HTTP, SSE streaming, or WebSockets. That pipe is tied to a single request, so the session only exists while those two specific endpoints stay up. If either drops, the session state is gone. Even if the session stays up, the direct pipe architecture means that no other client can observe what's happening, and no other agent can join the session.
The failures in the previous section occur because the session is living inside the connection, and connections are not built to be durable.
Moving to durable sessions #
A durable session is a persistent, addressable, shared resource that sits between the agent and client. The connection becomes one way of accessing the session, rather than the session itself, so multiple clients and agents can come and go while the session persists.
Once the session is a resource in its own right, it can hold a lot more than the conversation transcript. Tool call history can live there, so can presence, so can shared structured state - like which screen the user is currently on or what they last clicked. All of it is kept in sync for anyone connected.
Using the durable session model architecture unlocks features that would previously have been built as separate development items.
- Every connected client sees an up-to-date view, including a response that's still streaming in.
- A late joiner gets the same complete, correctly ordered history as everyone else, and it doesn't matter whether that late joiner is a new tab, a different device, or a human support agent stepping in.
- Subagents can publish straight into the session without everything having to funnel through a central orchestrator.
- Interruptions work cleanly because the agent receives an explicit cancel, redirect or steer signal instead of having to infer intent from the state of a connection.
- Presence enables smarter agent behavior: deprioritize work when the user is away, push a notification when the task completes, and let them know if it has to go offline mid-task.
What I showed in the demo #
The demo in the webinar was built on the , which lets developers supply their own custom transport layer. We replaced the default HTTP/SSE implementation with durable sessions. That was the only change: no additional infrastructure, no Redis, and no server-side buffering code.
__Vercel AI SDK__With that in place, I showed two tabs that stayed in sync for streaming responses, even across network disconnects. A subagent started in one tab showed up in the other and could be cancelled from there. Two concurrent requests were handled by separate subagents publishing directly into the session, with each agent’s output grouped cleanly in the conversation rather than interleaved into a single muddled stream. And a human support agent joining partway through got the full history straight away with no recap.
The Vercel AI SDK integration and session implementation are in our docs and repos if you want to look at the code directly.
Questions from the session #
How do durable sessions relate to durable execution? They sit at different layers and solve different halves of the same reliability story. Durable execution, which is the category Temporal and similar tools occupy, makes your backend workflows crash-proof. Durable sessions make the client-side conversation crash-proof. Your agent can be fully resumable on the backend and still leave users with a broken experience if the connection handling isn't right, so the two are complementary.
Do Vercel’s or Cloudflare's frameworks give me this already? Partly. A stable session address means a returning user or restarting agent can navigate back to it, which is useful. What they don’t have yet are shared structured state, presence, multi-device sync, and multi-agent coordination in a single session, because those frameworks are built around one workflow at a time.
Are there open source options? It depends on your architecture. The Vercel Workflow DevKit is a good starting point if you're on Vercel. ElectricSQL has a durable sessions concept for local-first apps. Ably isn't open source, but there's a free tier with enough usage for meaningful experimentation.
Questions to ask about your AI product #
If you want to work out whether any of this applies to your own product, there are three questions I'd start with. Is there already a problem you're not seeing clearly? Look at your CSAT themes once you've filtered out the complaints that are really about answer quality. Examine how your session lengths are distributed, because a lot of short sessions with small gaps between them can mean people losing context and restarting rather than getting anywhere. It's also worth speaking to engineering, because they may already be building pieces of a session layer ad hoc.
What do you want to build next? Notice which experiences on your roadmap assume multi-device continuity, bidirectional control, or human-AI handoffs because those have a session architecture requirement underneath.
Build or buy? If you care about behaving consistently across several products, or if the maintenance cost of your bespoke session layer is growing, a dedicated platform is worth evaluating.
If you want to explore further, the docs are a good place to start. There's also a if you want to experiment. And if you'd rather talk through your specific architecture, you can
free tierthrough the website. book a callFurther reading:* Why we're betting on Durable Sessions*
·
The model is fine. The session is broken.·
The Durable Sessions stack is forming