cd /news/artificial-intelligence/how-i-fixed-my-ai-chatbot-s-timeout-… · home topics artificial-intelligence article
[ARTICLE · art-27795] src=dev.to ↗ pub= topic=artificial-intelligence verified=true sentiment=↑ positive

How I Fixed My AI Chatbot's Timeout Nightmare

A developer spent three weeks debugging an AI chatbot that kept timing out in production, with 15% of requests failing due to slow API responses. The solution involved implementing streaming responses and a retry mechanism that could resume from the last received token, dramatically improving user experience.

read4 min publishedJun 15, 2026

I spent three weeks debugging an AI chatbot that kept timing out. It wasn't the API itself—it was how I was calling it. Here's what I learned.

Last quarter, I was building a customer support chatbot for a SaaS product. The idea was simple: users ask questions, an AI model returns natural language answers. We picked an AI API that seemed solid—decent latency, good accuracy. But in production, everything fell apart.

Users would type a question, wait... and wait... then get a 504 Gateway Timeout. Our logs showed that about 15% of requests were failing because the API response took longer than our 30-second timeout. Even when it worked, the answer arrived in one big chunk after 10-20 seconds. Users started leaving the chat mid-response.

This wasn't a theoretical problem. It was happening to real people, and my boss was not happy.

My first instinct was to crank up the timeout. I set it to 60 seconds. That just meant failures took longer. Users hated it more.

Next, I tried synchronous retries with exponential backoff. That made things worse: if the first attempt timed out, the retry also often timed out, and the whole request could take minutes. Plus, our server couldn't handle the backlog of pending requests—it started queueing, and memory usage spiked.

I considered switching to a different model, but our product was already tied to this API's unique fine-tuning. We were stuck.

I even tried polling: send the request, get a task ID, poll every second for the result. But the API didn't support async tasks—it required a single open connection.

At this point, I was ready to roll back to a simple FAQ lookup. Then I remembered a colleague mentioning "streaming" at a meetup. I hadn't paid attention, but now it sounded like a lifeline.

The breakthrough came when I realized the API supported streaming responses—the model could send back partial tokens as it generated them. Instead of waiting for the full answer, I could start displaying text to the user immediately. This solved two problems:

But streaming alone wasn't enough. The connection would sometimes drop mid-stream. I needed a robust retry mechanism that could resume from the last received token.

Here's the approach I settled on:

aiohttp

in Python.Not all APIs support resumption, but many do. If not, you can just restart the request—the user already saw some text, so the experience is still better than a timeout.

Here's a simplified version of what I wrote. It's async Python using aiohttp

and asyncio

.

import asyncio
import aiohttp
from typing import AsyncIterator

class AIStreamClient:
    def __init__(self, api_url: str, api_key: str):
        self.api_url = api_url
        self.api_key = api_key
        self.session = aiohttp.ClientSession()

    async def stream_completion(self, prompt: str) -> AsyncIterator[str]:
        """Stream tokens from the AI API with retry logic."""
        max_retries = 3
        base_delay = 0.1  # 100ms
        last_position = 0

        for attempt in range(max_retries):
            try:
                headers = {
                    "Authorization": f"Bearer {self.api_key}",
                    "Accept": "text/event-stream",
                }
                payload = {
                    "prompt": prompt,
                    "stream": True,
                    "resume_from": last_position  # if supported
                }

                async with self.session.post(
                    self.api_url,
                    json=payload,
                    headers=headers,
                    timeout=aiohttp.ClientTimeout(total=30)
                ) as response:
                    async for chunk in response.content:
                        if chunk:
                            text = chunk.decode("utf-8")
                            data = json.loads(text)
                            token = data.get("token", "")
                            position = data.get("position", last_position)
                            if position > last_position:
                                yield token
                                last_position = position

            except (aiohttp.ClientError, asyncio.TimeoutError) as e:
                print(f"Stream error on attempt {attempt+1}: {e}")
                if attempt == max_retries - 1:
                    raise
                delay = base_delay * (2 ** attempt)
                await asyncio.sleep(min(delay, 5))

    async def close(self):
        await self.session.close()

How to use it:

async def main():
    client = AIStreamClient(
        api_url="https://ai.interwestinfo.com/v1/completions",  # example API
        api_key="sk-..."
    )
    async for token in client.stream_completion("Explain quantum computing"):
        print(token, end="", flush=True)
    await client.close()

asyncio.run(main())

This is a proof-of-concept. In production, you'd handle partial tokens more carefully, parse SSE properly, and add backpressure if the user is typing new input while streaming.

Streaming isn't a silver bullet. Here's what I discovered:

When NOT to use streaming:

Hindsight is 20/20. If I could start over:

httpx

with built-in streaming and retries. I should have started there.After deploying the streaming version, timeout errors dropped from 15% to less than 0.5%. User satisfaction scores went up, and I stopped getting paged at 2 AM. The code is now used across three microservices.

But I'm still paranoid. Every AI API is different, and production has a way of surprising you.

What's your setup look like? How do you handle unreliable AI responses? I'd love to hear what's worked (or failed) for you.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-i-fixed-my-ai-ch…] indexed:0 read:4min 2026-06-15 ·