{"slug": "how-i-fixed-my-ai-chatbot-s-timeout-nightmare", "title": "How I Fixed My AI Chatbot's Timeout Nightmare", "summary": "A developer spent three weeks debugging an AI chatbot that kept timing out in production, with 15% of requests failing due to slow API responses. The solution involved implementing streaming responses and a retry mechanism that could resume from the last received token, dramatically improving user experience.", "body_md": "I spent three weeks debugging an AI chatbot that kept timing out. It wasn't the API itself—it was how I was calling it. Here's what I learned.\n\nLast quarter, I was building a customer support chatbot for a SaaS product. The idea was simple: users ask questions, an AI model returns natural language answers. We picked an AI API that seemed solid—decent latency, good accuracy. But in production, everything fell apart.\n\nUsers would type a question, wait... and wait... then get a 504 Gateway Timeout. Our logs showed that about 15% of requests were failing because the API response took longer than our 30-second timeout. Even when it worked, the answer arrived in one big chunk after 10-20 seconds. Users started leaving the chat mid-response.\n\nThis wasn't a theoretical problem. It was happening to real people, and my boss was not happy.\n\nMy first instinct was to crank up the timeout. I set it to 60 seconds. That just meant failures took longer. Users hated it more.\n\nNext, I tried synchronous retries with exponential backoff. That made things worse: if the first attempt timed out, the retry also often timed out, and the whole request could take minutes. Plus, our server couldn't handle the backlog of pending requests—it started queueing, and memory usage spiked.\n\nI considered switching to a different model, but our product was already tied to this API's unique fine-tuning. We were stuck.\n\nI even tried polling: send the request, get a task ID, poll every second for the result. But the API didn't support async tasks—it required a single open connection.\n\nAt this point, I was ready to roll back to a simple FAQ lookup. Then I remembered a colleague mentioning \"streaming\" at a meetup. I hadn't paid attention, but now it sounded like a lifeline.\n\nThe breakthrough came when I realized the API supported streaming responses—the model could send back partial tokens as it generated them. Instead of waiting for the full answer, I could start displaying text to the user immediately. This solved two problems:\n\nBut streaming alone wasn't enough. The connection would sometimes drop mid-stream. I needed a robust retry mechanism that could resume from the last received token.\n\nHere's the approach I settled on:\n\n`aiohttp`\n\nin Python.Not all APIs support resumption, but many do. If not, you can just restart the request—the user already saw some text, so the experience is still better than a timeout.\n\nHere's a simplified version of what I wrote. It's async Python using `aiohttp`\n\nand `asyncio`\n\n.\n\n``` python\nimport asyncio\nimport aiohttp\nfrom typing import AsyncIterator\n\nclass AIStreamClient:\n    def __init__(self, api_url: str, api_key: str):\n        self.api_url = api_url\n        self.api_key = api_key\n        self.session = aiohttp.ClientSession()\n\n    async def stream_completion(self, prompt: str) -> AsyncIterator[str]:\n        \"\"\"Stream tokens from the AI API with retry logic.\"\"\"\n        max_retries = 3\n        base_delay = 0.1  # 100ms\n        last_position = 0\n\n        for attempt in range(max_retries):\n            try:\n                headers = {\n                    \"Authorization\": f\"Bearer {self.api_key}\",\n                    \"Accept\": \"text/event-stream\",\n                }\n                payload = {\n                    \"prompt\": prompt,\n                    \"stream\": True,\n                    \"resume_from\": last_position  # if supported\n                }\n\n                async with self.session.post(\n                    self.api_url,\n                    json=payload,\n                    headers=headers,\n                    timeout=aiohttp.ClientTimeout(total=30)\n                ) as response:\n                    async for chunk in response.content:\n                        if chunk:\n                            text = chunk.decode(\"utf-8\")\n                            # Assume each chunk is a JSON with \"token\" and \"position\"\n                            # In reality, you'd parse SSE format\n                            data = json.loads(text)\n                            token = data.get(\"token\", \"\")\n                            position = data.get(\"position\", last_position)\n                            if position > last_position:\n                                yield token\n                                last_position = position\n\n            except (aiohttp.ClientError, asyncio.TimeoutError) as e:\n                print(f\"Stream error on attempt {attempt+1}: {e}\")\n                if attempt == max_retries - 1:\n                    raise\n                delay = base_delay * (2 ** attempt)\n                await asyncio.sleep(min(delay, 5))\n\n    async def close(self):\n        await self.session.close()\n```\n\n**How to use it:**\n\n``` python\nasync def main():\n    client = AIStreamClient(\n        api_url=\"https://ai.interwestinfo.com/v1/completions\",  # example API\n        api_key=\"sk-...\"\n    )\n    async for token in client.stream_completion(\"Explain quantum computing\"):\n        print(token, end=\"\", flush=True)\n    await client.close()\n\nasyncio.run(main())\n```\n\nThis is a proof-of-concept. In production, you'd handle partial tokens more carefully, parse SSE properly, and add backpressure if the user is typing new input while streaming.\n\nStreaming isn't a silver bullet. Here's what I discovered:\n\n**When NOT to use streaming:**\n\nHindsight is 20/20. If I could start over:\n\n`httpx`\n\nwith built-in streaming and retries. I should have started there.After deploying the streaming version, timeout errors dropped from 15% to less than 0.5%. User satisfaction scores went up, and I stopped getting paged at 2 AM. The code is now used across three microservices.\n\nBut I'm still paranoid. Every AI API is different, and production has a way of surprising you.\n\n**What's your setup look like?** How do you handle unreliable AI responses? I'd love to hear what's worked (or failed) for you.", "url": "https://wpnews.pro/news/how-i-fixed-my-ai-chatbot-s-timeout-nightmare", "canonical_source": "https://dev.to/__c1b9e06dc90a7e0a676b/how-i-fixed-my-ai-chatbots-timeout-nightmare-19md", "published_at": "2026-06-15 10:00:42+00:00", "updated_at": "2026-06-15 10:15:10.185529+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-products", "developer-tools"], "entities": ["aiohttp", "asyncio", "Python"], "alternates": {"html": "https://wpnews.pro/news/how-i-fixed-my-ai-chatbot-s-timeout-nightmare", "markdown": "https://wpnews.pro/news/how-i-fixed-my-ai-chatbot-s-timeout-nightmare.md", "text": "https://wpnews.pro/news/how-i-fixed-my-ai-chatbot-s-timeout-nightmare.txt", "jsonld": "https://wpnews.pro/news/how-i-fixed-my-ai-chatbot-s-timeout-nightmare.jsonld"}}