Think, Durable Objects, and the Real Shape of AI Applications

wpnews.pro

(slop, but accurate) Most AI chat examples look simple.

Take a list of messages. Send them to a model. Stream tokens back to the browser. Persist the result somewhere. Repeat.

That is a useful starting point, but it hides the actual problem. A production AI application is not just a model call. It is a small distributed system: clients disconnect, users open multiple tabs, tools run on both sides of the network, streams fail halfway through, requests overlap, assistant messages are updated after tool results arrive, and background work needs to survive restarts.

That is the part Think is trying to solve.

Think is an opinionated reasoning engine for Cloudflare Workers. It is centered around a chat-shaped turn model, but it is not only for chat products. Chat is the coordination interface: a durable way to represent intent, context, tool calls, partial progress, cancellation, recovery, and final output.

The interesting thing is not that Think wraps streamText

. The interesting thing is that it treats each interaction as a durable reasoning process with memory, tools, streams, recovery, and clients attached.

It is tempting to think of chat as a UI pattern: a textbox, a transcript, and a streaming assistant bubble.

Think uses chat differently. In Think, the chat turn is the operating model for AI inside an application.

A turn can start from a browser message, but it can also start from a scheduled task, a messenger webhook, a parent agent calling a sub-agent, or a programmatic

submitMessages() call. The result might be rendered in a chat UI, posted to an external messenger, inspected as a background submission, or used as the output of an agent tool.

That means the "chat" abstraction is not the product boundary. It is the common runtime shape:

accept intent from a user, system, webhook, schedule, or parent agent
assemble context and memory
choose and execute tools
stream progress
update durable state
handle cancellation and overlapping work
recover from interruption
produce a final result

This is why it is useful to center the interaction model around chat even when the final product does not look like a chatbot. A support bot, a code assistant, a scheduled analyst, a background workflow, and a delegated sub-agent all need a similar reasoning loop.

Think treats "chat" less as a UI and more as the control plane for agentic work.

The basic AI SDK flow is elegant: build messages, call the model, return a stream.

Once you add real product behavior, the shape changes quickly.

What happens if the browser reloads during a stream? What if the model calls a tool that has to run in the browser? What if the user sends another message before the previous turn finishes? What if a Durable Object is evicted after five chunks have streamed but before the final assistant message is persisted? What if the client sends back a stale optimistic message with a different assistant ID than the server has stored?

Think has explicit machinery for these cases.

There is a shared agents/chat

layer with primitives for turn queues, resumable streams, message reconciliation, tool state updates, cancellation, stream accumulation, continuation state, and recovery snapshots. Think then builds a full agent runtime on top of those primitives: session-backed messages, workspace tools, skills, extensions, MCP tools, client tools, server tools, scheduled tasks, messenger ingress, sub-agents, and durable programmatic submissions.

That is a very different thing from an endpoint that calls a model.

The core design bet is that an AI interaction wants to be an actor.

A reasoning session needs private state. It needs serialized execution so two turns do not corrupt each other. It needs local access to its own message history, stream chunks, config, pending submissions, tool state, and recovery metadata. It needs to own WebSocket connections and broadcast updates. It needs timers and background work.

Durable Objects fit that shape unusually well.

In Think, an agent instance is not just a request handler. It is the home for a conversation or long-lived reasoning process. Its SQLite storage contains the durable state. Its single-threaded execution model gives a natural place to serialize turns. Its WebSocket support lets it own live clients. Its scheduling and fiber support give it a way to recover or continue work after interruptions.

Without Durable Objects, you could still build something like Think, but you would have to assemble the same guarantees out of Postgres, Redis locks, queues, pub/sub, WebSocket gateways, job workers, and object storage. That is possible, but it changes the product. The runtime stops being an implementation detail and becomes a major part of the application architecture.

Think's strength is that many of those distributed systems concerns collapse into one durable actor.

Streaming is often described as "send chunks to the UI." In a real AI application, that is not enough.

Think persists stream chunks as they are produced. If a client reconnects, the server can replay stored chunks. The protocol distinguishes an active live stream from a completed stream and from an orphaned stream that was restored from SQLite after the original readable stream was lost. In the orphaned case, Think can finalize the partial stream, persist the assistant message that was reconstructed from chunks, and then decide whether to continue the turn.

That matters because reconnection is not rare edge behavior. It is normal browser behavior. Tabs reload. Networks flap. Workers hibernate. Mobile clients disappear.

A robust runtime needs to know what was already sent, what was persisted, what can be replayed, and what must be regenerated or continued. Think makes that explicit.

Tools make chat much more complicated.

Server tools are straightforward in comparison: the model calls a tool, the server executes it, the result goes back to the model. Think still adds useful control there. Tools can be wrapped with beforeToolCall

and afterToolCall

, allowing calls to be logged, blocked, modified, substituted, or measured.

Client tools are more interesting. The model is running server-side, but the tool may need to execute in the browser. Think supports that by accepting client-provided JSON Schema tool definitions, converting them into AI SDK tools without server-side execute

functions, streaming the tool call to the client, receiving the result back over the chat protocol, applying it to persisted message state, broadcasting the update, and optionally continuing the model turn.

That is a distributed tool call. It has protocol, persistence, state transitions, and recovery concerns.

This is where many chat demos quietly stop. Think does the boring work.

A common pattern is for the browser to own the message array and send it with each request. That works until the client and server disagree.

With tools, streaming, optimistic UI, multiple tabs, regeneration, and reconnects, disagreement is inevitable. The client may have an assistant message with a temporary ID. The server may already have a tool result that the client has not seen. A submitted message list may contain stale tool states. Two tabs may race.

Think treats server-side storage as the source of truth. Incoming client messages are reconciled against server state before persistence. Tool outputs known by the server are merged into stale client messages. Assistant IDs can be reconciled by exact ID, content, or tool call ID. This prevents duplicate assistant rows and orphaned tool calls.

It is unglamorous code, but it is exactly the sort of code that separates a demo from a system.

The most interesting part of Think may be recovery.

Think can wrap chat turns in durable fibers. A recovery snapshot records the request ID, whether the turn was a continuation, recent message IDs, custom request body, client tool schemas, and any user data stashed during the run.

If execution is interrupted, Think can inspect what happened. Did the interruption occur before any stream chunks were produced? Then it may retry the user turn. Did it happen after partial assistant output? Then it can persist the partial message and schedule a continuation. Was this part of a durable programmatic submission? Then the submission row can be completed, failed, or recovered. This is much more precise than "try again." It is recovery based on where the reasoning process actually stopped.

There are limits. An orphaned stream cannot magically resume the lost LLM reader. Oversized chunks may be skipped from replay storage. Recovery may skip if the conversation changed. Think is experimental. These caveats are important.

But the shape is right: recovery is modeled as part of the interaction lifecycle.

Think still uses the AI SDK. That is the point.

The AI SDK provides the model interface, streaming chunks, tool definitions, prepareStep

, streamText

, UI message formats, and provider abstraction. Think uses those as low-level primitives inside a larger runtime.

That distinction is important. The AI SDK is excellent at the model call. Think is exploring what has to exist around the model call for a durable AI application: state, concurrency, recovery, tools, memory, workspace, scheduling, and live clients.

In that sense, Think may be one of the more advanced examples of AI SDK usage: not because it calls the model in a novel way, but because it treats the model call as one phase inside a durable reasoning system.

Some of it, yes.

The turn pipeline, tool wrapping, stream accumulation, message reconciliation, sanitization, and lifecycle hooks could be extracted into a runtime-agnostic core. That core would be useful. It could run on Node, Bun, containers, or other serverless platforms with the right adapters.

But the full Think experience depends on stronger assumptions: one durable actor per reasoning process, local transactional storage, serialized execution, WebSocket ownership, alarms and schedules, and recovery hooks. Outside Durable Objects, those become adapter responsibilities.

A portable Think core would gain reach. It might also clarify the architecture.

But it would lose the simplicity of strong defaults. Instead of "extend Think

and get a durable agent," users would need to supply a database, lock strategy, stream store, queue, scheduler, pub/sub layer, and WebSocket coordination story.

So the honest answer is probably: Think could have a runtime-agnostic core, but Think as it exists is a demonstration of what becomes possible when the runtime itself gives every AI process a durable actor.

The industry still talks about AI applications as if they are mostly prompt management and model selection. Those matter, but they are not the whole system.

Real interactions are stateful. Real tools are distributed. Real streams fail. Real users reconnect, cancel, retry, and open multiple tabs. Real agents need memory, workspaces, schedules, sub-agents, and recovery.

Think is interesting because it takes those problems seriously.

It is not just a nicer wrapper around streamText

. It is a technical argument: if AI applications are long-running reasoning processes with tools and state, then the runtime should expose durable, addressable, stateful actors. Durable Objects provide that shape, and Think is what an AI SDK starts to look like when it leans into it.

source & further reading

gist.github.com — original article claudex: use Claude Code's interface with GPT/Codex models through CLIProxyAPI Product Manager skill for orchestrating independent Codex app threads Hermes on Raspberry Pi

Think, Durable Objects, and the Real Shape of AI Applications

Run your AI side-project on zahid.host