{"slug": "streaming-llm-responses-to-the-browser-in-go-server-sent-events", "title": "Streaming LLM responses to the browser in Go (Server-Sent Events)", "summary": "A developer has demonstrated how to implement token-by-token streaming from an LLM API to the browser using Server-Sent Events (SSE) in Go Fiber, reducing user wait time from 4-8 seconds to under one second for the first word. The approach uses the `text/event-stream` content type and the `EventSource` API to push individual tokens as they are generated, rather than buffering the complete response. The implementation includes proper SSE headers, request context cancellation for client disconnects, and a streaming client from the OpenAI Go SDK.", "body_md": "The biggest UX mistake in LLM-powered web apps is waiting for the complete response before sending anything. On a 400-token answer at typical generation speeds, that's 4–8 seconds of staring at a spinner. With streaming, the user sees the first word in under a second and reads along as the model generates. This tutorial shows you exactly how to implement token-by-token streaming from an LLM API to the browser using Server-Sent Events (SSE) in Go Fiber.\n\nWebSockets are bidirectional. For LLM streaming, you don't need that — you send one request, the server pushes tokens back. SSE is:\n\n`text/event-stream`\n\ncontent type`EventSource`\n\nAPIThe wire format is dead simple:\n\n```\ndata: {\"token\": \"Hello\"}\\n\\n\ndata: {\"token\": \" world\"}\\n\\n\ndata: [DONE]\\n\\n\n```\n\nEach event is `data: <payload>\\n\\n`\n\n. The double newline is the event terminator.\n\nHere's what not to do:\n\n```\n// BAD: collects full LLM response then sends it\nfunc badHandler(c *fiber.Ctx) error {\n    fullResponse := callLLMAndWaitForCompletion(c.Query(\"q\"))\n    return c.JSON(fiber.Map{\"response\": fullResponse})\n    // User waits 6 seconds. Sees response instantly. Still worse UX.\n}\n```\n\nEven if you send it \"instantly\" after receiving it, the user waited the full generation time. Buffering eliminates the perceived speed advantage of fast models.\n\n```\ngo get github.com/gofiber/fiber/v2\ngo get github.com/openai/openai-go  # or any OpenAI-compatible SDK\n// handlers/stream.go\npackage handlers\n\nimport (\n    \"bufio\"\n    \"context\"\n    \"encoding/json\"\n    \"fmt\"\n    \"log\"\n    \"net/http\"\n    \"strings\"\n    \"time\"\n\n    \"github.com/gofiber/fiber/v2\"\n    openai \"github.com/openai/openai-go\"\n    \"github.com/openai/openai-go/option\"\n)\n\ntype StreamHandler struct {\n    llmClient *openai.Client\n    model     string\n}\n\nfunc NewStreamHandler(apiKey, baseURL, model string) *StreamHandler {\n    client := openai.NewClient(\n        option.WithAPIKey(apiKey),\n        option.WithBaseURL(baseURL),\n    )\n    return &StreamHandler{llmClient: client, model: model}\n}\n\n// sseEvent writes a single SSE event to the response writer.\nfunc sseEvent(c *fiber.Ctx, data string) error {\n    _, err := fmt.Fprintf(c.Response().BodyWriter(), \"data: %s\\n\\n\", data)\n    return err\n}\n\nfunc (h *StreamHandler) StreamCompletion(c *fiber.Ctx) error {\n    query := strings.TrimSpace(c.Query(\"q\", \"\"))\n    if query == \"\" {\n        return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{\n            \"error\": \"query parameter 'q' is required\",\n        })\n    }\n    if len([]rune(query)) > 1000 {\n        return c.Status(fiber.StatusBadRequest).JSON(fiber.Map{\n            \"error\": \"query too long (max 1000 characters)\",\n        })\n    }\n\n    // Set SSE headers before writing any body\n    c.Set(\"Content-Type\", \"text/event-stream\")\n    c.Set(\"Cache-Control\", \"no-cache\")\n    c.Set(\"Connection\", \"keep-alive\")\n    c.Set(\"X-Accel-Buffering\", \"no\") // Critical for Nginx: disables proxy buffering\n\n    // Use the request context so the stream is cancelled if the client disconnects\n    ctx, cancel := context.WithTimeout(c.Context(), 60*time.Second)\n    defer cancel()\n\n    stream := h.llmClient.Chat.Completions.NewStreaming(ctx,\n        openai.ChatCompletionNewParams{\n            Model: openai.F(h.model),\n            Messages: openai.F([]openai.ChatCompletionMessageParamUnion{\n                openai.SystemMessage(\"You are a helpful technical assistant. Be concise and accurate.\"),\n                openai.UserMessage(query),\n            }),\n            MaxTokens:   openai.Int(800),\n            Temperature: openai.Float(0.3),\n        },\n    )\n    defer stream.Close()\n\n    tokenCount := 0\n    for stream.Next() {\n        chunk := stream.Current()\n        if len(chunk.Choices) == 0 {\n            continue\n        }\n\n        token := chunk.Choices[0].Delta.Content\n        if token == \"\" {\n            continue\n        }\n\n        tokenCount++\n        payload, err := json.Marshal(map[string]string{\"token\": token})\n        if err != nil {\n            continue\n        }\n\n        if err := sseEvent(c, string(payload)); err != nil {\n            // Client disconnected — stop generating\n            log.Printf(\"Client disconnected after %d tokens\", tokenCount)\n            return nil\n        }\n    }\n\n    if err := stream.Err(); err != nil {\n        // Send error event so the client knows what happened\n        errPayload, _ := json.Marshal(map[string]string{\n            \"error\": \"stream interrupted: \" + err.Error(),\n        })\n        _ = sseEvent(c, string(errPayload))\n        log.Printf(\"Stream error after %d tokens: %v\", tokenCount, err)\n        return nil\n    }\n\n    // Signal clean completion\n    _ = sseEvent(c, \"[DONE]\")\n    log.Printf(\"Stream complete: %d tokens for query: %q\", tokenCount, query)\n    return nil\n}\n// main.go\npackage main\n\nimport (\n    \"log\"\n    \"os\"\n\n    \"github.com/gofiber/fiber/v2\"\n    \"github.com/gofiber/fiber/v2/middleware/cors\"\n    \"github.com/gofiber/fiber/v2/middleware/limiter\"\n    \"stream-api/handlers\"\n)\n\nfunc main() {\n    apiKey  := os.Getenv(\"LLM_API_KEY\")\n    baseURL := os.Getenv(\"LLM_BASE_URL\") // e.g. \"https://api.openai.com/v1\"\n    model   := os.Getenv(\"LLM_MODEL\")    // e.g. \"gpt-4o-mini\"\n\n    streamHandler := handlers.NewStreamHandler(apiKey, baseURL, model)\n\n    app := fiber.New(fiber.Config{\n        // Disable response buffering — critical for SSE\n        StreamRequestBody: true,\n    })\n\n    app.Use(cors.New())\n\n    // Rate limit: 10 requests per minute per IP\n    app.Use(\"/api/stream\", limiter.New(limiter.Config{\n        Max:        10,\n        Expiration: 60,\n    }))\n\n    app.Get(\"/api/stream\", streamHandler.StreamCompletion)\n\n    log.Fatal(app.Listen(\":4001\"))\n}\n```\n\nThis is the complete frontend implementation. No libraries needed — the browser's native `EventSource`\n\nAPI handles reconnection automatically.\n\n```\n<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <title>LLM Stream Demo</title>\n    <style>\n        body { font-family: monospace; max-width: 800px; margin: 40px auto; padding: 0 20px; }\n        #output { white-space: pre-wrap; background: #f5f5f5; padding: 16px;\n                  border-radius: 4px; min-height: 60px; }\n        #status { color: #888; font-size: 0.85em; margin-top: 8px; }\n        button { margin-top: 12px; padding: 8px 16px; cursor: pointer; }\n        button:disabled { opacity: 0.5; cursor: not-allowed; }\n    </style>\n</head>\n<body>\n    <h2>LLM Streaming Demo</h2>\n    <input type=\"text\" id=\"query\" placeholder=\"Ask something...\" style=\"width:100%;padding:8px\">\n    <button id=\"btn\" onclick=\"startStream()\">Ask</button>\n    <button id=\"stop-btn\" onclick=\"stopStream()\" disabled>Stop</button>\n    <div id=\"output\"></div>\n    <div id=\"status\"></div>\n\n<script>\nlet currentSource = null;\n\nfunction startStream() {\n    const query = document.getElementById('query').value.trim();\n    if (!query) return;\n\n    // Clean up any existing stream\n    stopStream();\n\n    const output = document.getElementById('output');\n    const status = document.getElementById('status');\n    const btn = document.getElementById('btn');\n    const stopBtn = document.getElementById('stop-btn');\n\n    output.textContent = '';\n    status.textContent = 'Connecting...';\n    btn.disabled = true;\n    stopBtn.disabled = false;\n\n    const url = `/api/stream?q=${encodeURIComponent(query)}`;\n    currentSource = new EventSource(url);\n\n    let tokenCount = 0;\n    const startTime = Date.now();\n\n    currentSource.onmessage = function(event) {\n        if (event.data === '[DONE]') {\n            const elapsed = ((Date.now() - startTime) / 1000).toFixed(1);\n            status.textContent = `Done — ${tokenCount} tokens in ${elapsed}s`;\n            cleanup();\n            return;\n        }\n\n        try {\n            const parsed = JSON.parse(event.data);\n\n            if (parsed.error) {\n                status.textContent = `Error: ${parsed.error}`;\n                cleanup();\n                return;\n            }\n\n            if (parsed.token) {\n                output.textContent += parsed.token;\n                tokenCount++;\n                status.textContent = `Generating... (${tokenCount} tokens)`;\n                // Auto-scroll\n                output.scrollTop = output.scrollHeight;\n            }\n        } catch (e) {\n            console.error('Parse error:', e, 'Raw:', event.data);\n        }\n    };\n\n    currentSource.onerror = function(event) {\n        // EventSource fires onerror on clean close too — check readyState\n        if (currentSource.readyState === EventSource.CLOSED) {\n            return; // normal closure, already handled by [DONE]\n        }\n        status.textContent = 'Connection error. Retrying...';\n        // EventSource reconnects automatically after ~3s\n        // If you don't want auto-retry, call cleanup() here\n    };\n\n    currentSource.onopen = function() {\n        status.textContent = 'Connected, waiting for first token...';\n    };\n}\n\nfunction stopStream() {\n    if (currentSource) {\n        currentSource.close();\n        currentSource = null;\n    }\n    cleanup();\n}\n\nfunction cleanup() {\n    document.getElementById('btn').disabled = false;\n    document.getElementById('stop-btn').disabled = true;\n    currentSource = null;\n}\n</script>\n</body>\n</html>\n```\n\nAdd this to your Nginx server block. Without `proxy_buffering off`\n\n, Nginx will buffer the entire SSE stream and the user sees nothing until the response ends.\n\n```\nlocation /api/stream {\n    proxy_pass         http://127.0.0.1:4001;\n    proxy_http_version 1.1;\n    proxy_set_header   Connection \"\";        # disable keep-alive pooling\n    proxy_buffering    off;                  # CRITICAL for SSE\n    proxy_cache        off;\n    proxy_read_timeout 90s;                  # longer than your max stream duration\n    proxy_set_header   X-Real-IP $remote_addr;\n}\n```\n\nThe `X-Accel-Buffering: no`\n\nheader in the Go handler achieves the same effect when Nginx honors it, but setting `proxy_buffering off`\n\nin Nginx config is the belt-and-suspenders approach.\n\nThis is where SSE gets subtle. Once you've started writing the response body with `text/event-stream`\n\n, you cannot send an HTTP 500 status — the status line is already sent. Your error handling must happen in-band via a data event:\n\n```\n// In the Go handler — if LLM call fails after stream starts:\nerrPayload, _ := json.Marshal(map[string]string{\n    \"error\": \"rate_limit_exceeded\",\n    \"message\": \"Please try again in a moment.\",\n})\n_ = sseEvent(c, string(errPayload))\n// Then return nil — the HTTP layer doesn't know an error occurred\n```\n\nOn the client side, check every event for an `error`\n\nfield and handle it in `onmessage`\n\n, not just `onerror`\n\n. The `onerror`\n\nhandler fires for connection errors (network drop, server restart), not application-level errors embedded in the stream.\n\nAt 1,000 concurrent users each holding an SSE connection, you're holding 1,000 goroutines open. Go goroutines are cheap (4KB stack by default), so this is fine up to tens of thousands of connections on a modest server. The bottleneck will be your LLM API rate limits, not the SSE infrastructure.\n\nUse the `context.WithTimeout`\n\ncancel to ensure goroutines don't leak if the LLM API hangs. The `defer cancel()`\n\nin the handler guarantees cleanup even if the client disconnects before `[DONE]`\n\n.\n\nThis pattern — SSE in Fiber, EventSource in the browser, no-buffer Nginx config — is production-ready and requires zero additional dependencies beyond what a standard Go web API already uses.", "url": "https://wpnews.pro/news/streaming-llm-responses-to-the-browser-in-go-server-sent-events", "canonical_source": "https://dev.to/ayinedjimi-consultants/streaming-llm-responses-to-the-browser-in-go-server-sent-events-11on", "published_at": "2026-05-25 22:00:00+00:00", "updated_at": "2026-05-25 22:34:01.313493+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "ai-tools", "ai-products", "ai-infrastructure"], "entities": ["Go", "Server-Sent Events", "Fiber", "OpenAI"], "alternates": {"html": "https://wpnews.pro/news/streaming-llm-responses-to-the-browser-in-go-server-sent-events", "markdown": "https://wpnews.pro/news/streaming-llm-responses-to-the-browser-in-go-server-sent-events.md", "text": "https://wpnews.pro/news/streaming-llm-responses-to-the-browser-in-go-server-sent-events.txt", "jsonld": "https://wpnews.pro/news/streaming-llm-responses-to-the-browser-in-go-server-sent-events.jsonld"}}