Temporal Replay 2026: Serverless Workers and Durable AI

wpnews.pro

AI agents keep dying mid-task. An LLM call times out. A Lambda gets evicted. A Redis queue drops a job silently. Your “agentic system” fails, and there is no automatic resume because you built retry logic yourself and missed a case. Temporal’s position has always been: reliable execution is infrastructure, not application code. At Replay 2026 (Moscone South, San Francisco, May 5–7), the company shipped features that close three specific gaps developers have complained about for years — and all three are directly relevant to how you build AI agents today.

Serverless Workers: Temporal Without a Worker Fleet #

The biggest objection to Temporal adoption has never been the programming model — it has been the infrastructure cost. Running a persistent Worker fleet to process occasional, bursty workloads means paying for idle compute around the clock. Serverless Workers, now in pre-release, changes this. Your Workers run on AWS Lambda. Temporal Cloud monitors the Task Queue backlog, invokes your function when tasks arrive, and shuts it down when the work is done.

Setup is three steps: upload your Worker code to Lambda, create a cross-account IAM role using Temporal’s CloudFormation template, and register the Lambda ARN with Temporal Cloud via CLI or UI. Temporal handles invocation, scaling, and graceful shutdown before Lambda’s 15-minute execution limit. Worker Versioning is required, and the same SDK code that runs in a traditional Worker runs here unchanged.

The 15-minute hard limit is worth noting directly. It rules out any Activity that calls a slow GPU inference endpoint or processes long video. For the common case — agents orchestrating API calls, processing documents, managing tool chains — it is more than enough. If Temporal has been on your “evaluate later” list because of infra burden, this changes the calculation.

Standalone Activities: Replace Your Message Queue #

Most teams use SQS, Redis queues, or Celery for one specific job: run this task durably, retry on failure, report back when done. The implementation is a maze of dead-letter queues, visibility timeout tuning, and custom retry handlers. Standalone Activities are now in Public Preview for Go, Python, and .NET — and they are a direct replacement for that entire stack.

An Activity runs without a parent Workflow. You call it from application code, it gets durably enqueued in Temporal, it retries automatically on failure, and it is searchable, cancellable, and inspectable via the Temporal UI. The Python API is a single call with the Standalone Activities SDK:

handle = await client.start_activity(
    process_document,
    args=[DocumentInput(doc_id="abc123")],
    id="process-doc-abc123",
    task_queue="doc-processing",
    start_to_close_timeout=timedelta(minutes=5),
)
result = await handle.result()

The @activity.defn

decorator is identical to workflow-embedded activities. If your use case grows into multi-step orchestration, you promote the Activity into a Workflow without rewriting any Activity code. That migration path is deliberate — and it is the reason this beats SQS for anything beyond basic fire-and-forget.

Workflow Streams: Stream LLM Tokens From a Durable Workflow #

LLM token streaming — the typewriter effect — requires a live, low-latency connection to the model call. Durable execution requires checkpointing every step in append-only history. These two requirements are in direct tension. Workflow Streams resolves it.

The abstraction layers pub/sub over Temporal’s Signal and Update primitives. A publisher (your Workflow or an Activity) writes typed events to a named topic. A subscriber — your API layer, a frontend server-sent-events endpoint — polls in real time. Batching defaults to 2 seconds and drops to 100ms for AI integrations. State carries across Continue-As-New rollovers, so streams do not lose events if a Workflow restarts. Full details are in the Workflow Streams announcement post.

@activity.defn
async def stream_llm_response(prompt: str) -> str:
    client = WorkflowStreamClient.from_within_activity()
    async with client:
        tokens = client.topic("tokens", type=TokenEvent)
        async for chunk in llm.stream(prompt):
            tokens.publish(TokenEvent(text=chunk))
    return full_response

stream = WorkflowStreamClient.create(temporal_client, workflow_id)
async for item in stream.topic("tokens", type=TokenEvent).subscribe():
    yield item.data.text  # server-sent event to browser

Workflow Streams ships with built-in plugins for the OpenAI Agents SDK (now Generally Available) and Google ADK. The agent framework publishes token events automatically; the Workflow receives only the final result, keeping determinism intact. Currently in Public Preview for Python and TypeScript.

The Rest of the Replay Cluster #

Three more releases deserve attention. External Payload Storage (Public Preview, Python and Go) routes large workflow payloads — embeddings, RAG context, LLM outputs — through S3 or custom drivers instead of Temporal’s event history. This directly fixes the history saturation problem that AI pipeline teams have been workarounding with custom codecs for the past year. The Rust SDK enters Public Preview, covering full Workflow and Activity authoring. Worker Versioning, Task Queue Priority, Multi-region Replication, and Nexus all moved to General Availability — the production-readiness story for enterprise deployments is significantly cleaner than six months ago.

What This Means for Your Agentic Stack #

Serverless Workers change the cost model for bursty agents. Standalone Activities change the build-vs-integrate decision for durable job processing. Workflow Streams change what “durable and real-time” means in practice. None of these are incremental updates. Together, they address the three main reasons teams avoided Temporal for AI workloads: infrastructure overhead, queue service dependency, and LLM streaming incompatibility.

Temporal is not a replacement for LangGraph or similar agent orchestration frameworks. The pattern emerging in enterprise AI — LangGraph for agent graph logic, Temporal for durable execution — now has a much lower barrier to entry. The full Replay 2026 announcement post covers everything. Serverless Workers and Standalone Activities are in pre-release and public preview respectively — worth testing now if either addresses a real pain point in your current stack.

source & further reading

byteiota.com — original article Superhuman Acquires GPTZero: What AI Detection Means for Developers SpaceX Buys Cursor for $60B: What Developers Need to Know Claude Code Dynamic Workflows: The Complete Guide