How I Fixed My AI Chatbot's Laggy Responses with Server-Sent Events

A developer fixed laggy responses in their AI chatbot widget by switching from plain fetch requests to Server-Sent Events (SSE). The initial implementation caused 15-30 second delays and a 50% bounce rate, but SSE enabled incremental token streaming for real-time display. The solution uses a streaming endpoint with EventSource on the frontend and Express server-side streaming.

I've been building a personal AI assistant for my developer blog – you know, one of those floating chat widgets that answers questions about my projects. The idea was simple: feed in my content, hook it up to an AI API, and let visitors chat with it. But my first implementation was a disaster. Visitors would type a question, see the spinner spin for ten seconds, and then get the entire response dumped at once. It felt like using dial-up. The problem wasn't the AI itself; it was how I was consuming the stream of tokens. Here's the story of how I went from clunky polling to the elegant world of Server-Sent Events SSE . Like many devs, I started with the most obvious solution: plain fetch. I sent a POST request to the AI endpoint with the user's message, and waited for the full response as JSON. js // The naive way async function askAI userMessage { const response = await fetch 'https://api.your-ai-service.com/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify { message: userMessage } } ; const data = await response.json ; displayResponse data.text ; } This worked technically, but the delay was brutal. For long answers, the HTTP connection would hang for 15–30 seconds. Users saw the spinner forever, and I saw a 50% bounce rate on the chat page. Even with a loading indicator, the experience felt broken. I tried adding a timeout, but that just made things worse – the request would cancel before the AI finished thinking. Then I attempted to use a polling approach: after the initial request, the API returned a job ID, and I'd poll every second for the result. That at least showed progress, but it hammered my server with requests and the UX was still janky. Next, I considered WebSockets. A persistent connection for bidirectional streaming sounded perfect. But the overhead was real: I'd need to manage connection state, handle reconnection logic, configure my server for WebSocket upgrades, and deal with fallbacks for restrictive proxies. For a simple chatbot widget, it felt like pulling out a flamethrower to light a candle. Plus, most AI APIs I looked at didn't expose a native WebSocket interface – they just returned a blob of text. Then a colleague mentioned Server-Sent Events SSE . He said, “It’s like a light-weight one-way WebSocket.” That sounded exactly right: the server pushes text tokens incrementally, and the client listens. No complex handshake, just a regular HTTP connection with a special content type. I switched to a streaming endpoint that sends the AI response as a sequence of data: lines. On the frontend, I used the built-in EventSource API. Here's the core code that now powers my chatbot. My backend is a simple Express server that proxies to the AI service. The key: set Content-Type: text/event-stream and flush each token as it arrives. javascript // server.js – Express route for SSE streaming app.post '/chat-stream', async req, res = { const userMessage = req.body.message; // Set SSE headers res.writeHead 200, { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' } ; // Connect to the AI streaming endpoint // Example using ai.interwestinfo.com's streaming endpoint const aiResponse = await fetch ' https://ai.interwestinfo.com/stream https://ai.interwestinfo.com/stream ', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify { prompt: userMessage } , // Important: get the response as a stream } ; const reader = aiResponse.body.getReader ; const decoder = new TextDecoder ; while true { const { done, value } = await reader.read ; if done break; const chunk = decoder.decode value ; // Each chunk is part of the AI token stream res.write data: ${JSON.stringify { token: chunk } }\n\n ; } res.write 'data: DONE \n\n' ; res.end ; } ; On the browser side, I replaced the old fetch call with an EventSource . But because I'm sending a POST request EventSource only supports GET by default , I had to work around that limitation. I used a workaround: first make a POST to initiate the stream and get a session ID, then use EventSource on a GET endpoint with that ID. Or you can use the Fetch API with response.body.getReader directly if your frontend can handle it. I opted for the latter to keep it simple. javascript // frontend.js – using Fetch + ReadableStream async function askAIStream userMessage { const response = await fetch '/chat-stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify { message: userMessage } } ; const reader = response.body.getReader ; const decoder = new TextDecoder ; const outputElement = document.getElementById 'chat-output' ; outputElement.textContent = ''; while true { const { done, value } = await reader.read ; if done break; const chunk = decoder.decode value ; // Parse SSE format: data: {...}\n\n const lines = chunk.split '\n' ; for const line of lines { if line.startsWith 'data: ' { const data = line.slice 6 ; if data === ' DONE ' break; try { const { token } = JSON.parse data ; outputElement.textContent += token; } catch e { // ignore partial lines } } } } } Now the response appears character by character – actually token by token – as soon as the AI generates it. Users see a real-time typing effect, and there's no more spinner limbo. Switching to SSE wasn't all sunshine. Here are the real trade-offs I discovered: EventSource is well-supported in modern browsers, but older ones IE need a polyfill. If you're using ReadableStream on the Fetch API, some mobile browsers may choke. I ended up using a small polyfill for legacy clients. EventSource gives you. I had to implement my own retry logic with exponential backoff when the connection drops.If I were building this from scratch again, I'd probably use a library like event-source-polyfill to unify browser support, and I'd design the API to accept GET requests for SSE using a unique conversation ID rather than fighting with POST. That way I could use the native EventSource with its built-in reconnection. Also, I'd add a buffer for the last few tokens so that if the user refreshes the page, they can resume the conversation without losing context. Something like storing the conversation history in IndexedDB. The switch from fetch + polling to streaming via SSE turned my chatbot from a frustrating experience into something people actually enjoy using. It's not the most cutting-edge tech – SSE has been around for years – but it solved my problem without overcomplicating the stack. The next time you're building something that needs real-time data from a server AI responses, live logs, notifications , ask yourself: Do I really need WebSockets, or can SSE do the job? What's your go-to approach for streaming data from an API? I'd love to hear about your setup in the comments – especially if you've tackled the same chatbot problem with a different solution.