{"slug": "streaming-responses-with-claude-api-in-python-2026", "title": "Streaming Responses with Claude API in Python (2026)", "summary": "A developer has published a guide demonstrating how to stream responses from Anthropic's Claude API token by token in Python, reducing latency from several seconds to a few hundred milliseconds for chat interfaces. The tutorial covers raw event streams, async streaming, error handling, and a complete FastAPI endpoint that streams Claude's output to a browser using server-sent events. The implementation uses the Anthropic Python SDK's `stream` method, which provides access to individual event types including `content_block_delta` for incremental text and `message_delta` for stop reasons and token usage.", "body_md": "Originally published at\n\n[kalyna.pro]\n\nStreaming sends Claude's response token by token as it's generated, instead of waiting for the full completion before showing anything. For a chat UI this is the difference between a user staring at a spinner for several seconds and seeing the first words appear within a few hundred milliseconds. The [Claude API Tutorial](https://kalyna.pro/claude-api-tutorial/) introduces the basic `stream.text_stream`\n\nhelper — this guide covers the full picture: the raw event stream, async streaming, error handling, and a complete FastAPI endpoint that streams Claude's output to a browser.\n\n```\npip install anthropic\n# for the API endpoint example later:\npip install fastapi uvicorn\npython\nfrom anthropic import Anthropic\n\nclient = Anthropic()\n\nwith client.messages.stream(\n    model=\"claude-sonnet-4-6\",\n    max_tokens=1024,\n    messages=[{\"role\": \"user\", \"content\": \"Write a haiku about debugging.\"}],\n) as stream:\n    for text in stream.text_stream:\n        print(text, end=\"\", flush=True)\n\n    final_message = stream.get_final_message()\n\nprint(f\"\\n\\nstop_reason: {final_message.stop_reason}\")\nprint(f\"output tokens: {final_message.usage.output_tokens}\")\n```\n\n`stream.get_final_message()`\n\nreturns the same `Message`\n\nobject you'd get from a non-streaming call — complete `content`\n\n, `stop_reason`\n\n, and `usage`\n\n— without manually reassembling it from chunks.\n\n```\nwith client.messages.stream(\n    model=\"claude-sonnet-4-6\",\n    max_tokens=1024,\n    messages=[{\"role\": \"user\", \"content\": \"Write a haiku about debugging.\"}],\n) as stream:\n    for event in stream:\n        print(event.type)\n```\n\nEvent types, in order:\n\n`message_start`\n\n— initial `Message`\n\nshell with `usage.input_tokens`\n\n`content_block_start`\n\n— a new content block begins (`text`\n\n, `tool_use`\n\n, etc.)`content_block_delta`\n\n— incremental content: `text_delta`\n\n(`.text`\n\n), `input_json_delta`\n\n(`.partial_json`\n\n, for tool inputs), or `thinking_delta`\n\n`content_block_stop`\n\n— the block is complete`message_delta`\n\n— `stop_reason`\n\nand updated `usage.output_tokens`\n\n`message_stop`\n\n— stream finished\n\n```\nwith client.messages.stream(\n    model=\"claude-sonnet-4-6\",\n    max_tokens=1024,\n    messages=[{\"role\": \"user\", \"content\": \"Write a haiku about debugging.\"}],\n) as stream:\n    for event in stream:\n        if event.type == \"content_block_delta\" and event.delta.type == \"text_delta\":\n            print(event.delta.text, end=\"\", flush=True)\n        elif event.type == \"message_delta\":\n            print(f\"\\n[tokens so far: {event.usage.output_tokens}]\", end=\"\")\npython\nimport asyncio\nfrom anthropic import AsyncAnthropic\n\nclient = AsyncAnthropic()\n\nasync def main():\n    async with client.messages.stream(\n        model=\"claude-sonnet-4-6\",\n        max_tokens=1024,\n        messages=[{\"role\": \"user\", \"content\": \"Write a haiku about debugging.\"}],\n    ) as stream:\n        async for text in stream.text_stream:\n            print(text, end=\"\", flush=True)\n\nasyncio.run(main())\npython\nfrom fastapi import FastAPI\nfrom fastapi.responses import StreamingResponse\nfrom anthropic import AsyncAnthropic\n\napp = FastAPI()\nclient = AsyncAnthropic()\n\n@app.get(\"/chat\")\nasync def chat(message: str):\n    async def event_stream():\n        async with client.messages.stream(\n            model=\"claude-sonnet-4-6\",\n            max_tokens=1024,\n            messages=[{\"role\": \"user\", \"content\": message}],\n        ) as stream:\n            async for text in stream.text_stream:\n                yield f\"data: {text}\\n\\n\"\n\n        yield \"event: done\\ndata: {}\\n\\n\"\n\n    return StreamingResponse(\n        event_stream(),\n        media_type=\"text/event-stream\",\n        headers={\"Cache-Control\": \"no-cache\", \"X-Accel-Buffering\": \"no\"},\n    )\n```\n\n`X-Accel-Buffering: no`\n\nstops nginx from buffering the whole response — without it, \"streaming\" arrives in one burst at the end. On the frontend, read with `fetch`\n\n+ a `ReadableStream`\n\nreader, or `EventSource`\n\nfor GET endpoints.\n\n``` python\nimport anthropic\n\ntry:\n    with client.messages.stream(\n        model=\"claude-sonnet-4-6\",\n        max_tokens=1024,\n        messages=[{\"role\": \"user\", \"content\": \"Write a haiku about debugging.\"}],\n    ) as stream:\n        for text in stream.text_stream:\n            print(text, end=\"\", flush=True)\nexcept anthropic.APIConnectionError:\n    print(\"\\n[connection lost — showing partial response]\")\nexcept anthropic.RateLimitError:\n    print(\"\\n[rate limited — retry shortly]\")\nexcept anthropic.APIStatusError as e:\n    print(f\"\\n[API error {e.status_code}]\")\n```\n\nIf the client disconnects mid-response, exit the generator early so the SDK closes the stream — this stops billing for output tokens generated into the void. For long generations, check `await request.is_disconnected()`\n\nperiodically and break if true.\n\nText still arrives via `text_delta`\n\n, tool arguments arrive incrementally via `input_json_delta`\n\n, and `stream.get_final_message()`\n\ngives fully-parsed `tool_use`\n\nblocks once the stream ends. See [Claude API Function Calling](https://kalyna.pro/claude-api-function-calling/) for the complete tool-use loop — it works unchanged whether calls are streamed or not.\n\n`get_final_message()`\n\nfor `stop_reason`\n\n/`usage`\n\ninstead of accumulating `message_delta`\n\nmanually`AsyncAnthropic`\n\nin web backends — a sync stream blocks the event loop`Cache-Control: no-cache`\n\nand `X-Accel-Buffering: no`\n\nfor SSE behind a proxy`APIConnectionError`\n\n, `RateLimitError`\n\n, and `APIStatusError`\n\nexplicitly`stream.text_stream`\n\nyields plain text chunks for display`message_start`\n\n, `content_block_start`\n\n, `content_block_delta`\n\n, `content_block_stop`\n\n, `message_delta`\n\n, `message_stop`\n\n`get_final_message()`\n\nreturns the complete `Message`\n\nafter streaming`AsyncAnthropic`\n\n+ `async with`\n\n/`async for`\n\nfor non-blocking backends`StreamingResponse`\n\n+ async generator → SSE to the browser`input_json_delta`\n\ncarries tool argumentsFurther reading:", "url": "https://wpnews.pro/news/streaming-responses-with-claude-api-in-python-2026", "canonical_source": "https://dev.to/kalyna_pro/streaming-responses-with-claude-api-in-python-2026-44la", "published_at": "2026-06-12 13:09:50+00:00", "updated_at": "2026-06-12 13:41:20.751457+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "ai-tools", "natural-language-processing", "ai-products"], "entities": ["Claude", "Anthropic", "Python", "FastAPI", "Claude API Tutorial", "kalyna.pro", "Claude Sonnet 4-6"], "alternates": {"html": "https://wpnews.pro/news/streaming-responses-with-claude-api-in-python-2026", "markdown": "https://wpnews.pro/news/streaming-responses-with-claude-api-in-python-2026.md", "text": "https://wpnews.pro/news/streaming-responses-with-claude-api-in-python-2026.txt", "jsonld": "https://wpnews.pro/news/streaming-responses-with-claude-api-in-python-2026.jsonld"}}