Streaming Responses with Claude API in Python (2026)

A developer has published a guide demonstrating how to stream responses from Anthropic's Claude API token by token in Python, reducing latency from several seconds to a few hundred milliseconds for chat interfaces. The tutorial covers raw event streams, async streaming, error handling, and a complete FastAPI endpoint that streams Claude's output to a browser using server-sent events. The implementation uses the Anthropic Python SDK's `stream` method, which provides access to individual event types including `content_block_delta` for incremental text and `message_delta` for stop reasons and token usage.

Originally published at kalyna.pro Streaming sends Claude's response token by token as it's generated, instead of waiting for the full completion before showing anything. For a chat UI this is the difference between a user staring at a spinner for several seconds and seeing the first words appear within a few hundred milliseconds. The Claude API Tutorial https://kalyna.pro/claude-api-tutorial/ introduces the basic stream.text stream helper — this guide covers the full picture: the raw event stream, async streaming, error handling, and a complete FastAPI endpoint that streams Claude's output to a browser. pip install anthropic for the API endpoint example later: pip install fastapi uvicorn python from anthropic import Anthropic client = Anthropic with client.messages.stream model="claude-sonnet-4-6", max tokens=1024, messages= {"role": "user", "content": "Write a haiku about debugging."} , as stream: for text in stream.text stream: print text, end="", flush=True final message = stream.get final message print f"\n\nstop reason: {final message.stop reason}" print f"output tokens: {final message.usage.output tokens}" stream.get final message returns the same Message object you'd get from a non-streaming call — complete content , stop reason , and usage — without manually reassembling it from chunks. with client.messages.stream model="claude-sonnet-4-6", max tokens=1024, messages= {"role": "user", "content": "Write a haiku about debugging."} , as stream: for event in stream: print event.type Event types, in order: message start — initial Message shell with usage.input tokens content block start — a new content block begins text , tool use , etc. content block delta — incremental content: text delta .text , input json delta .partial json , for tool inputs , or thinking delta content block stop — the block is complete message delta — stop reason and updated usage.output tokens message stop — stream finished with client.messages.stream model="claude-sonnet-4-6", max tokens=1024, messages= {"role": "user", "content": "Write a haiku about debugging."} , as stream: for event in stream: if event.type == "content block delta" and event.delta.type == "text delta": print event.delta.text, end="", flush=True elif event.type == "message delta": print f"\n tokens so far: {event.usage.output tokens} ", end="" python import asyncio from anthropic import AsyncAnthropic client = AsyncAnthropic async def main : async with client.messages.stream model="claude-sonnet-4-6", max tokens=1024, messages= {"role": "user", "content": "Write a haiku about debugging."} , as stream: async for text in stream.text stream: print text, end="", flush=True asyncio.run main python from fastapi import FastAPI from fastapi.responses import StreamingResponse from anthropic import AsyncAnthropic app = FastAPI client = AsyncAnthropic @app.get "/chat" async def chat message: str : async def event stream : async with client.messages.stream model="claude-sonnet-4-6", max tokens=1024, messages= {"role": "user", "content": message} , as stream: async for text in stream.text stream: yield f"data: {text}\n\n" yield "event: done\ndata: {}\n\n" return StreamingResponse event stream , media type="text/event-stream", headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"}, X-Accel-Buffering: no stops nginx from buffering the whole response — without it, "streaming" arrives in one burst at the end. On the frontend, read with fetch + a ReadableStream reader, or EventSource for GET endpoints. python import anthropic try: with client.messages.stream model="claude-sonnet-4-6", max tokens=1024, messages= {"role": "user", "content": "Write a haiku about debugging."} , as stream: for text in stream.text stream: print text, end="", flush=True except anthropic.APIConnectionError: print "\n connection lost — showing partial response " except anthropic.RateLimitError: print "\n rate limited — retry shortly " except anthropic.APIStatusError as e: print f"\n API error {e.status code} " If the client disconnects mid-response, exit the generator early so the SDK closes the stream — this stops billing for output tokens generated into the void. For long generations, check await request.is disconnected periodically and break if true. Text still arrives via text delta , tool arguments arrive incrementally via input json delta , and stream.get final message gives fully-parsed tool use blocks once the stream ends. See Claude API Function Calling https://kalyna.pro/claude-api-function-calling/ for the complete tool-use loop — it works unchanged whether calls are streamed or not. get final message for stop reason / usage instead of accumulating message delta manually AsyncAnthropic in web backends — a sync stream blocks the event loop Cache-Control: no-cache and X-Accel-Buffering: no for SSE behind a proxy APIConnectionError , RateLimitError , and APIStatusError explicitly stream.text stream yields plain text chunks for display message start , content block start , content block delta , content block stop , message delta , message stop get final message returns the complete Message after streaming AsyncAnthropic + async with / async for for non-blocking backends StreamingResponse + async generator → SSE to the browser input json delta carries tool argumentsFurther reading: