Scaling QUIC Ingress: eBPF Socket Steering for HTTP/3 Connection Migration

The InstaTunnel engineering team developed an eBPF-based socket steering solution to preserve QUIC connection migration across load-balanced worker processes. By teaching the kernel to route packets based on QUIC Connection IDs instead of the 4-tuple hash, the approach prevents session drops when a client's IP changes. This enables reliable HTTP/3 ingress for high-frequency telemetry and mobile edge workloads.

IT InstaTunnel Team Published by our engineering team Scaling QUIC Ingress: eBPF Socket Steering for HTTP/3 Connection Migration Quick answer Scaling HTTP/3 for High-Frequency Telemetry: eBPF Socket : MCP tunnel answer MCP tunneling gives a local MCP server a public HTTPS endpoint so AI tools can reach it during development without deploying the server first. What is MCP tunneling? MCP tunneling exposes a local Model Context Protocol server through a public endpoint so compatible AI tools can connect during development. When should I use InstaTunnel for MCP? Use InstaTunnel Pro when a local MCP endpoint needs public HTTPS access, stable routing, and stream-friendly tunnel behavior. When a remote edge node drops off the network for a few hundred milliseconds and comes back with a new IP address, a naive UDP proxy deployment will silently kill the session that was supposed to survive exactly that kind of disruption. This article looks at why that happens, and how eBPF-based socket steering at the kernel layer fixes it — using the real mechanisms Linux and Cloudflare actually ship, not just the theory. Why QUIC, and why it breaks naive load balancing Real-time telemetry — industrial sensor networks, autonomous-vehicle sensor fusion, mobile edge workloads — has largely moved off TCP and onto HTTP/3’s QUIC transport. TCP’s strict in-order delivery means a single lost packet stalls every stream multiplexed on that connection head-of-line blocking . QUIC avoids this by running its own loss recovery and stream multiplexing directly over UDP, so a dropped packet on one stream doesn’t stall the others. QUIC also supports 0-RTT — but it’s worth being precise about what that means: 0-RTT lets a returning client resume a previous session and send application data immediately, using a pre-shared key from an earlier handshake. A brand-new client still needs a full 1-RTT TLS 1.3 handshake; 0-RTT is a resumption optimization, not a property of every QUIC handshake. The feature that matters most for this article is connection migration. A TCP connection is pinned to a 4-tuple — source IP, source port, destination IP, destination port. Change any of those a phone switching from Wi-Fi to 5G, a robot roaming between access points and the connection is gone; the client has to renegotiate from scratch. QUIC decouples the session from the network path by identifying it with a Connection ID CID instead of the 4-tuple. Per RFC 9000, a CID can be up to 20 bytes and is opaque to the peer — the server picks it, hands it to the client, and can keep recognizing that client even after its IP and port change mid-session. That’s a huge win for a single client talking to a single server. It becomes a problem the moment the server side is actually a fleet of load-balanced worker processes. The 4-tuple hash breaks under migration Reverse proxies like NGINX, Envoy, and HAProxy scale across CPU cores by running multiple worker processes, each with its own socket bound to the same port via SO REUSEPORT. For TCP, this is easy: the kernel handles the handshake and accept hands a completed connection to exactly one worker, which the kernel then keeps routing to for the life of that connection. UDP has no handshake and no persistent kernel-side connection state, so SO REUSEPORT falls back to a much simpler mechanism: for every incoming datagram, the kernel hashes the 4-tuple and picks a socket from the reuseport group by that hash. As long as the 4-tuple stays fixed, every packet lands on the same worker. The instant a client’s IP changes — the entire point of QUIC connection migration — the 4-tuple changes, the hash changes, and the kernel routes the packet to a different worker that has never seen this client, holds no TLS keys for it, and has no choice but to drop the packet. QUIC’s headline feature is neutralized by a load-balancing mechanism that predates it. Teaching the kernel about QUIC with eBPF Rather than hard-coding QUIC awareness into the kernel, Linux lets you attach a custom eBPF program to a reuseport group and let it make the socket-selection decision instead of the default hash. This capability is BPF PROG TYPE SK REUSEPORT, added by Martin KaFai Lau in Linux 4.19, and it pairs with the bpf sk select reuseport helper, which assigns an incoming packet to a specific socket in a BPF MAP TYPE REUSEPORT SOCKARRAY map and, since Linux 5.8, SOCKHASH/SOCKMAP maps as well . If the eBPF program returns an invalid index, the kernel silently falls back to the default 4-tuple hash, so the mechanism degrades safely. This lets you replace “hash the 4-tuple” with “read the QUIC Connection ID out of the packet and route on that instead” — entirely in kernel space, before the packet ever reaches a userspace socket buffer. The steering pipeline Worker embeds its identity in the CID. During the very first handshake packet, before any migration has happened, the default hash is harmless — there’s no established state yet to misroute. The worker that lands the handshake say, Worker 2 generates the Server Connection ID it hands back to the client, and encodes its own worker index somewhere inside those bytes alongside cryptographic entropy. The eBPF program parses the QUIC header in-kernel. On every subsequent packet, the sk reuseport program inspects the raw payload via struct sk reuseport md, distinguishes QUIC’s long header handshake packets from the short header steady-state 1-RTT packets , and extracts the Destination Connection ID field. Worker ID lookup, not a hash-table scan. Because the worker ID is embedded directly in the CID rather than requiring a lookup in a table mapping millions of CIDs to sockets, the eBPF program just masks out the relevant bits to recover the integer. bpf sk select reuseport does the routing. The extracted worker ID is used as the index into the socket array, and the kernel delivers the datagram straight to that worker’s socket — regardless of what the client’s current IP address is. One correction worth making here: this “encode routing info directly in the CID” idea isn’t just a bespoke trick — it’s exactly the problem the IETF’s draft-ietf-quic-load-balancers “QUIC-LB” spec set out to standardize, with a defined octet layout a reserved first octet for config-rotation/self-encoded-length bits, with the server/worker ID starting at the second octet, followed by an encrypted or obfuscated nonce . It’s important to be accurate about its status, though: QUIC-LB never advanced past Internet-Draft status and is now listed as expired/inactive by the IETF datatracker. It never became an RFC. That doesn’t make the technique fictional — plenty of real load balancers and proxies implement their own variant of the same idea — but it’s not an adopted standard, just a well-documented, unofficial convention. eBPF isn’t a general-purpose scripting environment It’s worth being concrete about why the eBPF program has to be this narrow and cheap, rather than hand-waving about “restrictions.” The in-kernel verifier statically proves a program will terminate and stay memory-safe before it’s ever allowed to load: Each program is capped at 512 bytes of stack space. Unbounded loops were rejected outright until Linux 5.3 introduced provably-terminating “bounded loops”; before that, loops had to be unrolled at compile time. The verifier enforces an overall complexity budget on the order of a million simulated instruction-states per program , and blows past it quickly if you put unbounded-looking loops or excessive branching in a hot-path program. None of this is exotic for a header-parsing task like CID extraction, but it does explain why the CID-encoding scheme is deliberately simple a few bytes, masked out directly rather than something that needs a real data structure to resolve. Handling restarts: what actually ships in production The original framing of this problem as “socket generations, similar to Cloudflare’s approach” undersold how concrete this already is in production. Cloudflare shipped exactly this as an open-source project called udpgrm UDP Graceful Restart Marshal , described in a May 2025 engineering blog post, and it’s worth walking through because it resolves the upgrade problem more rigorously than a hand-rolled generation counter would. The core issue: when you restart or reload a QUIC-terminating proxy, you get two sets of SO REUSEPORT sockets in the same group — one from the old binary, draining its existing connections, and one from the new binary, accepting new ones. A naive CID-based eBPF router would just extract “Worker 2” and blindly hand the packet to new Worker 2, breaking every in-flight connection that belonged to the old Worker 2. udpgrm’s model: A socket generation is the set of reuseport-group sockets belonging to one logical instance of the server i.e., one deployment . A working generation pointer tells the eBPF program which generation should receive brand-new flows. A flow dissector decides, per packet, whether it belongs to a new flow for QUIC, an Initial packet or an established one, and if established, which specific socket generation originally owns it — even if that’s an older, draining generation. Flow state and socket references live in a SOCKHASH map that the daemon populates and keeps in sync from userspace, decoupling that bookkeeping from the application itself. udpgrm ships three built-in dissector modes plus a “bespoke” template: a FLOW dissector that tracks a fixed-size 4-tuple hash table useful for protocols with no native connection identifier , a CBPF cookie-based dissector where the routing identifier is embedded directly in the packet — exactly the QUIC-CID scheme described above, which Cloudflare calls a “udpgrm cookie” — and a NOOP mode for stateless protocols like DNS that don’t need any of this. The daemon integrates with systemd via a small setsockopt/getsockopt-based control protocol and a “decoy” process trick to work around systemd’s assumption that only one instance of a service runs at a time. The practical takeaway for anyone building this themselves: don’t reinvent generation tracking and flow dissection from scratch unless you have a very specific reason to — udpgrm or a similar production-tested reuseport-eBPF daemon already solves the graceful-restart half of this problem, which is genuinely the harder half to get right. Where this leaves enterprise HTTP/3 ingress The shift from TCP to QUIC solves a real, longstanding transport-layer problem — but it exposes an assumption baked deep into how Linux load-balances UDP: that a “flow” is defined by its 4-tuple. QUIC explicitly rejects that assumption, and the kernel’s default SO REUSEPORT behavior hasn’t caught up on its own. BPF PROG TYPE SK REUSEPORT and bpf sk select reuseport are the real, current mechanisms for closing that gap; QUIC-LB is the now-lapsed standardization attempt for the CID encoding convention; and udpgrm is a concrete, open-source example of what a production-grade version of the full pipeline — migration-aware routing and zero-downtime restarts — actually looks like today. Sources RFC 9000 — QUIC: A UDP-Based Multiplexed and Secure Transport IETF draft-ietf-quic-load-balancers — QUIC-LB: Generating Routable QUIC Connection IDs expired Internet-Draft Cloudflare Blog — “QUIC restarts, slow problems: udpgrm to the rescue”, Marek Majkowski, May 7, 2025 udpgrm GitHub repository eBPF Docs — Program Type BPF PROG TYPE SK REUSEPORT eBPF Docs — Helper Function bpf sk select reuseport eBPF Docs — Loops Linux kernel commit — “bpf: Introduce BPF PROG TYPE SK REUSEPORT”, Martin KaFai Lau Vincent Bernat — “Using eBPF to load-balance traffic across UDP sockets with Go” Changelog Metadata removed: - Stripped the SEO-style title/hook-line pairing and the unverified trailing “presentation” blurb that read like leftover CMS metadata rather than sourced content. Corrections: - Clarified that QUIC’s 0-RTT applies to session resumption with a pre-shared key, not to every handshake — a first-time connection still requires a full 1-RTT handshake. - Corrected the CID worker-ID encoding example: the original draft said the worker ID sits in “the first two bytes” of the CID. The actual convention this mirrors IETF QUIC-LB reserves the first octet for config-rotation/length bits and starts the server/worker ID at the second octet. - Added the accurate standardization status of that CID-encoding scheme: draft-ietf-quic-load-balancers never progressed to RFC and is currently listed as an expired Internet-Draft — it’s a well-known convention, not an adopted standard. - Replaced the vague “similar to Cloudflare’s udpgrm framework” aside with a verified, detailed description of udpgrm’s actual mechanics working generation, flow dissectors, SOCKHASH-based state, systemd integration , sourced directly from Cloudflare’s engineering blog and the project’s public README. - Confirmed and kept: BPF PROG TYPE SK REUSEPORT, bpf sk select reuseport , BPF MAP TYPE REUSEPORT SOCKARRAY, the 20-byte QUIC CID limit, and the general 4-tuple-hash-breaks-under-migration mechanism — all verified against RFC 9000, the eBPF documentation project, and the original 2018 kernel commit. Extensions: - Added sourced, concrete detail on eBPF verifier constraints 512-byte stack limit, pre-5.3 unbounded-loop rejection, complexity budget to explain why the steering program has to stay minimal, rather than asserting it without support. - Added a full section on udpgrm’s dissector modes FLOW, CBPF, NOOP, BESPOKE and its systemd integration approach, since this is the actual production implementation of the “socket generations” concept the original draft only gestured at. - Added a Sources section with direct links to every primary source used RFC, IETF draft, Cloudflare engineering blog, eBPF docs, kernel commit . Related InstaTunnel pages Continue from this article into the most relevant product guides and workflows. Localhost tunnel guide Expose a local app securely with a public URL for QA, demos, mobile testing, and integrations. Plans and limits Compare Free, Pro, and Business limits for tunnels, MCP endpoints, bandwidth, and teams. InstaTunnel documentation Read setup steps, CLI commands, webhook guides, MCP usage, and troubleshooting workflows. Use-case playbooks Browse practical workflows for webhooks, OAuth callbacks, MCP tunnels, and demo links. Related Topics