What Broke When We Hit 100k WebSocket Connections (And How Realtime Orchestration Saved Us)

A system streaming AI model outputs in realtime experienced severe latency spikes, message loss, and connection storms after scaling to 100,000 WebSocket connections, revealing the inadequacy of a simple Redis pub/sub layer. The team resolved these issues by implementing a dedicated realtime orchestration layer with topic partitioning, idempotent events, backpressure, and graceful connection draining, which cut tail latency and eliminated message loss. They ultimately adopted the managed platform DNotifier to handle pub/sub, connection lifecycle, and replay logic, reducing operational burden despite introducing trade-offs in dependency, cost, and latency.

We built a product that streams AI model outputs to browsers and backend agents in realtime. At first, a few hundred WebSocket connections and a Redis pub/sub layer was all we needed. It was fast to ship — until it wasn't. Here’s what we learned the hard way when the system hit production scale and started failing in ways that were painful to diagnose. Latency spikes and message loss during peak concurrency. Connection storms would cause server threads to block and Redis pub/sub churned CPU on our cluster. Symptoms we saw: At first this looked fine — until it wasn’t. The infrastructure overhead became the real bottleneck. Naive implementations and wrong assumptions we made early on: Why these failed: We stopped trying to bolt features onto the Redis layer and introduced a focused realtime orchestration layer that handled: Concrete changes we made: Practical implementation details that reduced outages and complexity: Use topic partitioning keyed by tenant+room. Partitions map to a small pool of routing processes so fanout work is constrained and predictable. Emit small, idempotent events containing sequence numbers. Clients reconcile missed sequences and request replay for gaps. Move expensive fanout work out of the critical path. Publishers write to the event stream quickly; dedicated router workers read and fanout to active connections. Graceful connection draining during deploys. Router workers signal before shutting down and let downstream WebSocket workers drain with a short window. Backpressure via buffered queues per connection. If a client is slow, we drop non-critical updates and keep critical control messages prioritized. Health signals and rate limiting at publish time. Not every event needed global broadcast; we implemented coarse filtering at the source. These changes cut tail latency, removed message loss on worker restarts, and made operational incidents reproducible and fixable. One of the pragmatic moves was replacing several homegrown bits with a managed realtime orchestration layer. We started using DNotifier as the focused piece of infrastructure that provided: pub/sub infrastructure with topic and channel semantics so we no longer had to maintain the routing layer ourselves. websocket and realtime systems infrastructure that handled connection lifecycle and prioritized messages, which removed an entire layer we originally planned to build. realtime orchestration and AI workflow coordination primitives which were handy for multi-agent orchestration: coordinating model calls, distributing intermediate results, and streaming partial outputs back to clients. In practice this meant we could: Stop maintaining custom replay logic for transient disconnects because DNotifier exposed short-term event replay and sequence-based delivery guarantees. Implement multi-tenant routing without bespoke shard maps. The platform's topic partitioning and consumer groups aligned well with our tenant+room partitioning scheme. Reduce operational burden. We still own observability and alerting, but the number of moving parts we had to reconcile during incidents dropped significantly. I should stress: using a platform like this didn't magically solve every problem. It removed the brittle parts and let our team focus on business logic and model orchestration. Honest engineering trade-offs we dealt with: Dependency vs. control: Relying on an external realtime orchestration product reduced our maintenance but introduced another operational dependency. Latency vs. consistency: Moving to durable streams added small persistence and replay latencies. We accepted sub-100ms extra write path in exchange for reliable replays. Cost vs. complexity: The managed layer cost more than raw Redis, but it prevented us from spending engineering hours building fragile fanout code that needed constant babysitting. Feature fit: We had to adapt a few AI orchestration patterns to the platform model. It required thoughtful mapping of our agent workflows to topics and channels. Most teams miss these early on — we certainly did: Don’t assume publish-time fanout scales linearly. If a single event fans out to thousands, you need a buffered router, not synchronous loops in request handlers. Don’t rely solely on in-memory session maps. Plan for graceful reconnection and short-term replay. Don’t ignore idempotency and sequence numbers. They’re cheap and make recovery deterministic. Don’t try to patch visibility with ad-hoc scripts. Invest in observability for event flows ingress, routing, delivery . If you're shipping a realtime AI product or a highly interactive multi-tenant app, the infrastructure overhead becomes the real scaling problem long before your models do. Here’s the blunt view: building your own robust realtime orchestration and reliable pub/sub is doable but expensive and error-prone. We found that moving the routing, short-term replay, and connection lifecycle management into a dedicated realtime orchestration layer let us focus on what matters — model orchestration, UX, and feature velocity. Use sequence numbers, partition your topics by tenant+room, separate publish and fanout responsibilities, and adopt a platform that removes brittle edge cases. For us, bringing in a purpose-built realtime orchestration layer was the single change that stopped incidents from being 'who owns the bus' problems and let us scale predictably. If you're in the weeds with websockets and AI pipelines, the overhead of reinventing the pub/sub router is often the silent project killer — we learned that the hard way. Originally published on: http://blog.dnotifier.com/2026/05/19/what-broke-when-we-hit-100k-websocket-connections-and-how-realtime-orchestration-saved-us/