# Temporal Workflow Streams: Stream AI Agent Output in Real Time

> Source: <https://byteiota.com/temporal-workflow-streams-stream-ai-agent-output-in-real-time/>
> Published: 2026-06-16 23:17:23+00:00

Your AI agent is halfway through a 90-second run — three LLM calls deep, tool results coming in, two sub-agents fanning out. The user sees a spinner. Then the server restarts. The workflow replays correctly from checkpoint, but the user has no idea what’s been happening. No output reached them before the crash.

This is the problem [Temporal Workflow Streams](https://temporal.io/blog/replay-2026-product-announcements) was built to fix. Announced at Replay 2026 and now in Public Preview, Workflow Streams gives durable Temporal workflows a real-time output channel without requiring Redis, a separate SSE server, or any custom state management. The stream is built on Temporal’s own Signal and Update primitives — which means it inherits the same durability guarantees as the workflow itself.

## The Old Workaround

Before Workflow Streams, teams streaming progress from inside a Temporal Activity to a frontend did something like this:

- Activity publishes tokens to Redis pub/sub during execution
- A separate SSE server subscribes to Redis and streams to the client
- Hope the Redis connection and SSE server survive the duration of the run

It works — until something breaks. Server restart: Redis connection drops. Client refreshes: the SSE stream dies and is gone. You’re back to manually stitching state together. The bitter irony is that Temporal already knows exactly what your workflow is doing, step by step, in durable history — but that knowledge was inaccessible to the outside world mid-run. Workflow Streams changes that.

## How Workflow Streams Works

Workflow Streams is a contrib library in the Temporal Python SDK (Go and .NET supported; Java and TypeScript in pre-release). The architecture has three roles:

**The Workflow (host):** Owns an append-only, offset-addressed event log**Publishers:** Append events — can be the Workflow itself, its Activities, or external processes via`WorkflowStreamClient`

**Subscribers:** Connect to the Workflow ID, optionally filter by topic, and consume events by long-polling from a stored offset

Under the hood, Temporal’s existing message primitives do the work: Signals carry publishes, Updates serve the long-poll subscriptions, and a Query exposes the current global offset. The stream IS the workflow history — no external pub/sub layer needed.

``` js
# Workflow: create a stream and let activities publish to it
from temporalio.contrib.workflow_streams import WorkflowStream

@workflow.defn
class AgentWorkflow:
    def __init__(self):
        self._stream = WorkflowStream(self)

    @workflow.run
    async def run(self, prompt: str) -> str:
        return await workflow.execute_activity(
            call_llm_and_stream,
            args=[prompt, self._stream],
            start_to_close_timeout=timedelta(minutes=5),
        )

# Client: subscribe and receive events as they arrive
from temporalio.contrib.workflow_streams import WorkflowStreamClient

async with WorkflowStreamClient(client, workflow_id="agent-123") as sub:
    async for event in sub.events(topic="tokens"):
        print(event.data, end="", flush=True)
```

The client resumes from its last-seen offset automatically. If it disconnects and reconnects, it picks up exactly where it left off — no tokens dropped.

## The Decisive Advantage: Offset-Based Resumption

This is where Workflow Streams beats plain SSE and WebSocket for long-running agent scenarios. Both give you real-time output, but neither survives failures without a separate state store:

| SSE | WebSocket | Temporal Workflow Streams | |
|---|---|---|---|
| Survives server crash | No | No | Yes |
| Offset-based resumption | No | Requires Redis | Built-in |
| Bidirectional | No | Yes | Yes (via Signals) |
| Observability | DIY | DIY | Built into Temporal UI |
| Latency | Very low | Very low | ~100ms (tunable) |

For agents that run for more than a few seconds — LLM chains, multi-step coding agents, data pipelines — crash recovery matters. SSE is the right tool for a 2-second response. It is not the right tool for a 10-minute agentic run.

## Latency, Tuning, and History Cost

The default configuration targets a slow-moving UI, not real-time token streaming. The key parameter to tune is `batch_interval`

, which defaults to 2 seconds:

```
# Lower batch_interval from the default 2s for token streaming
stream = WorkflowStream(self, batch_interval=timedelta(milliseconds=100))
```

Expected round-trip after tuning: roughly 100ms. That is fine for the typical AI agent UI — not for voice or sub-50ms interactive scenarios.

One trade-off to understand: each published batch is one Signal, each subscriber poll is one Update. Both accumulate against Temporal’s per-run history limit. For agents that run for hours, plan for Continue-As-New from the start — Workflow Streams carries the essential log offset across the boundary automatically.

## Where This Fits in the Temporal AI Stack

Workflow Streams is the piece that completes Temporal’s answer to production AI agents. Combined with the other Replay 2026 announcements — [Serverless Workers on Lambda](https://temporal.io/blog/replay-2026-product-announcements) and Standalone Activities for durable background jobs — and first-class integrations with [OpenAI Agents SDK](https://temporal.io/blog/announcing-openai-agents-sdk-integration) (GA since March 2026), [Pydantic AI](https://temporal.io/blog/build-durable-ai-agents-pydantic-ai-and-temporal), Vercel AI SDK, and Google ADK, the stack now covers the full agent lifecycle: durable orchestration, serverless compute, background jobs, and real-time streaming output.

If you are building AI agents that run longer than a few seconds and need to show progress to users, Workflow Streams is worth evaluating now. Full documentation is at [docs.temporal.io/develop/python/workflows/workflow-streams](https://docs.temporal.io/develop/python/workflows/workflow-streams). Public Preview means it is production-ready, with the standard caveat that the API may still change before GA.
