Version v0.2.0 of @ably/ai-transport
reorganises the SDK to better support a wide range of interaction patterns. Everything in an AI session β input, output, agent lifecycle, control signals β is captured durably, allowing you to easily build the sophisticated interaction patterns that support modern AI user experiences.
When we first built @ably/ai-transport
, we modelled an AI conversation the way most people first picture it: as a request and a response. A client sent a prompt, the server streamed an answer back, and the SDK's job was to move those tokens across the wire as quickly and reliably as possible. We called the moving part a transport, and we called one exchange a turn.
We came to believe this is the wrong shape for the AI applications people are now building. Those applications expect more of an agent than a single answer: you want to steer it while it works, interrupt it mid-task, or and resume it. These rich interactions are difficult to build with a simple request/response model.
From the start we have described [AI Transport](https://ably.com/docs/ai-transport) in terms of [ durable sessions](https://durablesessions.ai/) β conversations that survive reconnection, span devices, and outlive any single request, connection, or process. AI Transport
[v0.2.0](https://www.npmjs.com/package/@ably/ai-transport/v/0.2.0)more fully realises that idea: the transport gives way to the
session, and a conversation's state β every interaction between participants β is durably represented in an Ably channel.
What an AI session actually is #
An AI conversation has three properties that the latest AI Transport SDK embraces:
It is multi-step and long-running. A modern agent does not answer in one step. It calls a tool, waits for a result, reasons again, perhaps s for a human to approve an action, and resumes. A single unit of agent activity can span many model steps and an arbitrary amount of wall-clock time. So the unit we model is a- something that can start, suspend, resume, and end β rather than a single exchange that is either in flight or finished.runIt is multi-party and multi-device. The same conversation might be open on a phone and a laptop. A second user might be looking at the same thread. The agent's output is therefore not a private reply addressed to whoever asked; it is shared state to which every participant can subscribe (provided you have access to the session).It is a single shared stream of events. A session is the totality of what happens in it: every input from a participant and every output from an agent appears as a single unified stream. Any participant can contribute to that stream independently and both clients and agents can read and subscribe to it in realtime.
The session is the channel #
Ably already provides an ordered, durable log of messages: the channel. We think this is a very useful substrate for durable AI conversations.
Any number of participants can publish and subscribe to channels in realtime, with Ably's exactly-once delivery and
[. The channel is](https://ably.com/docs/platform/architecture/message-ordering)
__ordering guarantees__[and](https://ably.com/docs/storage-history/history)
durablefault-tolerant, so any participant can drop out and rejoin without losing their place: for example, a client that reconnects or reloads the page, or an agent process that stops and starts again.
In v0.2.0 we stopped treating the channel as a simple pipe for streaming tokens and started treating it as the session itself. Concretely, everything is now an event on the channel:
- The client publishes the user's input to the channel (
ai-input
) rather than tucking it into an HTTP body. - The agent publishes its output to the channel (
ai-output
). - Lifecycle events for the run -
`ai-run-start`
,`ai-run-suspend`
,`ai-run-resume`
,`ai-run-end`
- are published as events on the channel. - Control signals, such as
ai-cancel
, are explicit channel events too.
Once every event is on the channel, the conversation's state is derived consistently by each participant by reading the log. The channel is authoritative; the client and the agent are both observers of the same stream. This gives every participant a fully synchronised, realtime view of all activity in the session.
An overview of the key changes #
Before going through the pieces individually, it helps to see them working together. The striking thing is how little code there is.
In these examples we pair Vercel's AI SDK with Ably AI Transport. On the client, you read the conversation and publish input through the session's View
:
On the agent, an HTTP handler turns an invocation into a run, streams a model response into it, and either ends or suspends it:
That is the whole developer surface. Everything else β publishing inputs and outputs, emitting lifecycle events, replaying history on reconnect, keeping every participant in sync β the SDK handles underneath.
If you build with the Vercel's AI SDK UI on the frontend, you can use our drop-in custom transport that backs
useChat
with an Ably session, so you keep the full useChat
API and get resumability and multi-client sync out of the box:The rest of this section unpacks the core concepts in this SDK release.
The session owns its channel
A session is a long-lived context that survives reconnection and can be joined by more than one participant. The session has a name and is backed by an Ably channel. There are two types of session handlers which provide access to the same underlying session:
createClientSession
provides the API surface for end clients: it includes APIs for sending user input, editing messages, regenerating earlier responses, and cancelling runs in-flight. It exposes aView
to render the conversation as it changes, which controls for selecting a particular branch through the conversation tree.createAgentSession
provides the agent API surface: it processes invocations and includes APIs for managing agent run lifecycle and streaming LLM output.
A run is a unit of agent execution
A run can be long-running, and it can suspend β when it needs a tool result or a human decision β and later resume.
Because a run lives on the channel rather than in a process's memory, it can outlive the process that started it. The agent can exit entirely while a run is suspended, and a fresh process β a new serverless invocation, say β can resume the same run later. Serverless deployments and human-in-the-loop s need no extra plumbing and no external state store.
Invocations as a first-class concept
An invocation and a run are different things: an invocation says "please act on this input"; a run bounds the agent's resulting execution. Keeping the two separate is what lets a run be long-running, suspend, and resume independently of the call that started it.
In practice, an invocation is just the payload of an HTTP POST used to poke an agent process into life. It carries no content β the content is already on the channel β only a pointer to the input in the session on which the agent should act:
Whatever you use to trigger the agent β an HTTP request, a queue, a scheduled job β just delivers that pointer; how the agent is woken up is entirely up to you.
This is also how you manage the agent's lifecycle. An invocation pokes a process into life to handle a request, and that process can exit again the moment its run suspends or ends. While it runs, the agent is subscribed to the session and sees every event for its run as it happens, so it can react in realtime: picking up a freshly published input, or stopping the instant a cancel arrives.
A codec for any model provider or framework
The session machinery β runs, branching, hydration, control signals β knows nothing about the shape of an AI message. The specifics of a message, a tool call, or a reasoning step live in one place: the codec, which defines how a given model provider or agent framework's event format maps onto channel messages, and how those events fold back into conversation state.
This allows you to use AI Transport with any AI stack. The session provides the hard realtime parts β durable streaming,
reconnection and recovery, multi-client sync, client-agent interactions β once, regardless of how the events themselves are shaped. You keep whichever model provider and agent framework you already use, and AI Transport slots in beneath them as the session layer.
Today the SDK ships a codec for the Vercel AI SDK, so its UIMessage
format works out of the box; codecs for other frameworks and providers are on the way, or you can write your own. The core session takes the codec as a parameter:
A tree of runs, where branching is structural
A conversation with a capable agent is rarely linear. When you edit an earlier prompt, or regenerate a response, the new attempt creates a branch in the conversation thread from that point. This lets you build sophisticated agent harnesses: you can explore several approaches to a problem in parallel and keep whichever works, backtrack to an earlier point when one goes wrong and try a different tack, or compare alternative answers side by side β all without losing the work that came before.
Therefore AI sessions are best modelled as a tree, rather than a linear sequence of messages. The AI Transport SDK models the conversation as a tree of input nodes and run nodes:
Editing a prompt forks the input node βa new sibling branch from the same point.Regenerating continues from the same input with a new run.
Both the client and the agent hydrate the tree from the log of events on the channel. Since the tree is derived from the channel log rather than owned by any one process, every participant converges on the same session state.
A participant rarely works with the whole tree at once. At any moment it follows a single path through it: one choice at each branch point, flattened into the ordered list of messages that forms its current context. In the AI Transport SDK, a linear path through the tree is called a View
: a projection of the tree down to one branch, whether that is the thread a client renders on screen or the history an agent feeds to the model for the run it is handling.
A View
lets you navigate the tree and render the linear list of messages along the currently selected branch; editing or regenerating forks a new branch; and at any branch point you can inspect the alternatives and switch between them:
Resumption and multi-client sync
Because every participant derives its state from events durably held in the channel, stream resumption and hydration come for free. A client can reload the page mid-stream, recover from a dropped connection and resume, or join the session late on
another tab or device; in each case it reads the channel and arrives at the same session state, with no snapshot to save or restore.
Since every participant reads the same event log, concurrent clients stay in sync in realtime β two people watching one session, or the same user across devices, see the same stream of events. Each event the agent publishes carries the clientId
of the client that triggered the run, so output can always be attributed back to whoever asked for it.
This is handled for you: on connect()
the client hydrates from the channel and then follows live events, so a reload or reconnect recovers on its own with no extra code:
Cancellation as a signal, not a special case
Because cancellation is just another message on the channel, stopping a run is a durable control signal routed to the active run. A response stream stopped mid-flight resolves to a terminal cancelled
state. The run exposes an abort signal that can be used to cancel any in-flight LLM calls:
Get started #
@ably/ai-transport
v0.2.0 is available now: β learn more about how to use the AI Transport SDK.Read the docsβthe product overview and the problems it solves.See what AI Transport isβ including the demos showing the AI Transport SDK in action.Read the sourceβ you need an Ably account and an__Sign up free__to run a session, and the free tier covers everything you need to start.API key
We would love to see what you build.