{"slug": "zeroserve-a-zero-config-web-server-you-can-script-with-ebpf", "title": "Zeroserve: A zero-config web server you can script with eBPF", "summary": "Zeroserve, a new zero-configuration HTTPS server, serves entire websites from a single tarball over HTTP/2 and TLS 1.3 while allowing users to script request handling with eBPF programs that run as sandboxed, JIT-compiled middleware in userspace. The server collapses traditional declarative configuration and separate scripting layers into one eBPF program that handles routing, authentication, rate limiting, and proxying for every request. Designed as an alternative to nginx and Caddy, zeroserve aims to simplify deployment and eliminate configuration complexity by treating the eBPF program itself as the entire configuration.", "body_md": "*Disclaimer: This article is co-authored with GPT-5.5 and Claude Opus 4.8.*\n\n[zeroserve](https://github.com/losfair/zeroserve) is a small, fast, zero-config HTTPS server. You hand it a tarball of a website and it serves it - over HTTP/2 and TLS 1.3, with hot reload and a tiny resident footprint. The twist is that you can drop eBPF programs into the tarball and they run on every request, in userspace, as sandboxed middleware - rewriting, authenticating, and rate-limiting requests, or reverse-proxying them to a backend when you want it to act as a gateway in front of your app.\n\nIn short:\n\n**Fast**: on one core it beats nginx across most workloads - small and large static files, scripted middleware, and small-response proxying, all over HTTPS.**Efficient eBPF scripting**: scripts are JIT-compiled to native code and sandboxed in userspace, cheap enough to run on every request.** Program-as-configuration**: your eBPF program is the whole configuration, deciding what happens to each request.: every network and disk operation is submitted through`io_uring`\n\nthroughout`io_uring`\n\n.**Modern TLS in the box**: TLS 1.3, HTTP/2, Encrypted Client Hello, SNI certificate selection, and JA4 fingerprinting.** Simple to operate**: serve a whole site from one tarball and hot-reload it (and the TLS material) with a`SIGHUP`\n\n.\n\nIt's meant to be an alternative to nginx and Caddy, and the design bet is about *configuration*. Those servers give you a declarative config language - `location`\n\nblocks, `rewrite`\n\nrules, `map`\n\ndirectives, `try_files`\n\n- and then, once the declarative language hits its limits, an optional scripting runtime bolted on the side (Lua, or Caddy's plugins). Behavior ends up split across two layers: directives that quietly grow their own control flow, plus scripts that run somewhere in the request lifecycle you have to keep in your head.\n\nzeroserve collapses that into one thing. There is no config file. The eBPF program *is* the configuration - a single, ordinary, sandboxed program that sees every request and decides what happens: routing, headers, auth, rate limiting, proxying. I want the whole request path in one program I can read top to bottom.\n\n## One tarball, served in place\n\nThe whole site is a single `tar`\n\nfile. zeroserve indexes it on load - building a `path -> byte-range`\n\nmap - and then serves files by issuing byte-range reads against the tarball itself. Nothing is ever unpacked to disk. The site lives entirely in that one file, so there's no document root for a stray `location`\n\nrule to expose, and a deploy is a single atomic file swap. To package a directory:\n\n```\nzeroserve --pack ./public > site.tar\nzeroserve --addr 0.0.0.0:8080 site.tar\n```\n\nDeploying a new version is \"replace the tarball and send `SIGHUP`\n\n\". The reload swaps the site, the scripts, and the TLS material atomically, in the same process, with no dropped connections:\n\n```\nkillall -SIGHUP zeroserve\n```\n\nAll network and disk I/O goes through `io_uring`\n\n(via the [monoio](https://github.com/bytedance/monoio) runtime). Each instance is a single-threaded event loop. That sounds like a limitation, and per-process it is - but it's the right shape when your scaling unit is \"more processes\", and it's why many of them coexist happily on one box.\n\n## Scripting with eBPF, in userspace\n\nThis is the part I find most fun. Any `.c`\n\nfile you put under `.zeroserve/scripts/`\n\ngets compiled to an eBPF object at pack time (with `clang`\n\nand `llc`\n\n) and runs on every request. The eBPF runs entirely in userspace: zeroserve loads the bytecode into a runtime ([async-ebpf](https://crates.io/crates/async-ebpf)) inside its own ordinary, unprivileged process, so the kernel's BPF subsystem and `CAP_BPF`\n\nstay out of it. async-ebpf JIT-compiles the bytecode to native machine code (it vendors [uBPF](https://github.com/iovisor/ubpf)), so your \"config\" runs as native x86-64.\n\nA *pointer cage* does the job the kernel verifier normally would, keeping the program from reading or writing memory it shouldn't: every memory access in the JIT-compiled code is masked into the program's own arena, so a stray access stays confined to the script's own memory.\n\nThe script runs directly on zeroserve's single event loop. To keep one slow script from stalling every other connection, the runtime is fully preemptible: a timer can interrupt JIT-compiled native code mid-execution and hand control back to the event loop.\n\nThe programming model is a chain of scripts, run in sorted filename order, sharing a per-request metadata map. If a script calls `zs_respond`\n\nor `zs_reverse_proxy`\n\n, the chain short-circuits. Here's a script that runs first and enriches every request:\n\n```\n#include <zeroserve.h>\n\nZS_ENTRY\nzs_u64 entry(void) {\n  char peer[64];\n  if (zs_req_peer(peer, sizeof(peer)) <= 0) zs_strcpy(peer, \"unknown\");\n\n  // publish values for the HTML template pass\n  zs_meta_set(ZS_STR(\"visitor\"), ZS_STR(peer));\n  // attach a header to *every* response: static files, zs_respond, proxied\n  zs_meta_set(ZS_STR(\"zs.response.header.x-served-by\"), ZS_STR(\"zeroserve-ebpf\"));\n  return 0;\n}\n```\n\nThe metadata it sets does two things. Keys under `zs.response.header.*`\n\nbecome response headers on everything. And other keys feed a tiny template pass: a `<zs-meta>visitor</zs-meta>`\n\nplaceholder in an HTML file gets substituted on the way out. So you get dynamic-ish static pages without a template engine.\n\nThe [helper surface](https://github.com/losfair/zeroserve/blob/main/sdk/zeroserve.h) a script can call is broad:\n\n**Request inspection and mutation**: read the method, path, query params, headers, and peer address; rewrite the URI or set and remove headers before the response goes out.**Crypto and encoding**: SHA-256, HMAC-SHA256, base64, hex, and`getrandom`\n\n.**JSON**: parse a request body, build and mutate a document tree, and reply with`zs_json_respond`\n\n.**Rate limiting**: per-key token buckets keyed on anything from a peer IP to an API key, with state that survives hot reloads.** AWS SigV4**: signed`Authorization`\n\nheaders and presigned URLs for talking to S3 and other AWS services.**OIDC login**: a complete relying-party flow (Authorization Code + PKCE) that carries the entire login session in sealed XChaCha20-Poly1305 cookies, so you can gate a static site behind \"log in with Google\" while the server stays stateless.\n\nA dynamic endpoint is just a script that responds:\n\n```\nZS_ENTRY\nzs_u64 entry(void) {\n  char path[64];\n  zs_req_path(path, sizeof(path));\n  if (zs_strcmp(path, \"/health\") != 0) return 0;\n\n  zs_meta_set(ZS_STR(\"zs.response.header.content-type\"), ZS_STR(\"application/json\"));\n  zs_respond(200, ZS_STR(\"{\\\"status\\\":\\\"ok\\\"}\\n\"));\n  return 0;\n}\n```\n\nEach script runs under a memory-footprint cap (256 KB by default), the runtime time-slices long-running scripts off the executor and throttles the runaways, and scripts can even call each other (`zs_call`\n\n) up to a bounded depth. A script that spins forever stalls only its own request - the preemption timer interrupts it and the server keeps serving everyone else.\n\nThe TLS story underneath is more complete than the zero-config framing suggests: TLS 1.3 only, terminated by BoringSSL, with native **Encrypted Client Hello** (so the real SNI never appears in cleartext), SNI certificate selection from a directory, JA4 client fingerprinting exposed to scripts, and a transparent ECH relay mode that byte-for-byte forwards undecryptable handshakes to a real upstream so a protected name blends in behind a public one. That's a lot of transport security to ship in a single zero-config binary.\n\n## How fast is it?\n\nI benchmarked zeroserve against nginx 1.26 and Caddy 2.11 over HTTPS on an 8-core Ryzen 7 3700X, each serving the same content with the same self-signed certificate. Because a zeroserve instance is single-threaded by design, the only fair comparison is *per core*: I pinned every server to one CPU with `taskset`\n\n(and held nginx to `worker_processes 1`\n\nand Caddy to `GOMAXPROCS=1`\n\n; zeroserve is single-threaded already) and drove load with `wrk -t4 -c100`\n\nfrom other cores, taking the median of three 10-second runs. `wrk`\n\nspeaks HTTP/1.1, so these are HTTP/1.1-over-TLS-1.3 numbers with the handshake amortized across long-lived keep-alive connections: the steady-state cost of serving an already-open HTTPS connection.\n\n**Small static file (174 B)** - the bread and butter of static sites:\n\n| server | req/s | p99 |\n|---|---|---|\n| zeroserve | 36,681 |\n5.4 ms |\n| nginx | 31,226 | 7.8 ms |\n| Caddy | 12,830 | 22 ms |\n\nzeroserve serves small files about 17% faster than nginx on a single core, with a tighter tail. HTML pages, small JSON, CSS - this is the case zeroserve is tuned for.\n\n**Large static file (100 KB):**\n\n| server | req/s | throughput | p99 |\n|---|---|---|---|\n| zeroserve | 8,000 |\n782 MB/s | 22 ms |\n| nginx | 7,600 | 773 MB/s | 28 ms |\n| Caddy | 6,084 | 590 MB/s | 44 ms |\n\nAll three are close here, with zeroserve a hair ahead at around 780 MB/s on one core. nginx's usual trump card for large files is `sendfile()`\n\n, which splices file pages from the page cache to the socket with zero userspace copies. Under TLS that path goes unused: the bytes have to be encrypted in userspace anyway (short of kernel TLS, which all three leave off), so every server is bound by the same encrypt-and-write loop, and zeroserve's `io_uring`\n\nread-and-write path is a touch faster at it.\n\n## eBPF vs Lua\n\nThe obvious comparison for the scripting is nginx + LuaJIT (`ngx_http_lua_module`\n\n), the usual way to run fast code inside a web server. So I wrote the equivalent Lua for two cases and put them head to head.\n\nOne tuning knob matters a lot here. zeroserve ships with a conservative default: it arms the script-preemption timer every **2 ms**. Fine granularity makes it quick to throttle a misbehaving script, but it taxes every well-behaved one - at the default, eBPF trails nginx Lua on a fully dynamic response (about 32k req/s against 41k). Bumping `--preempt-timer-interval-ms`\n\nto 10 recovers ~40% of scripting throughput and turns that around:\n\n**Per-request header-injection middleware** (script runs, static file is still served):\n\n| engine | req/s | p99 |\n|---|---|---|\n| zeroserve eBPF (10 ms) | 43,709 |\n5.1 ms |\n| zeroserve eBPF (2 ms default) | 31,334 | 6.7 ms |\nnginx Lua (`header_filter` ) |\n28,653 | 8.4 ms |\n\n**Fully dynamic JSON response:**\n\n| engine | req/s | p99 |\n|---|---|---|\n| zeroserve eBPF (10 ms) | 46,945 |\n4.5 ms |\nnginx Lua (`content_by_lua` ) |\n41,231 | 6.4 ms |\n| zeroserve eBPF (2 ms default) | 32,393 | 6.7 ms |\n\nAt the 10 ms interval, tuned eBPF wins both cases. On the middleware case - a script shaping an otherwise-static response - it beats nginx Lua by about 50%, with a tighter tail. On the fully synthetic response it edges nginx's heavily-tuned `content_by_lua`\n\ntoo (47k against 41k). Both engines compile to native code (LuaJIT is a tracing JIT; async-ebpf JITs the eBPF through uBPF), and with TLS encryption as a shared per-request cost, the tuned eBPF path comes out ahead on throughput. At the 2 ms default, eBPF keeps the middleware win but gives up the synthetic-response lead, so I'd run production scripts at 10 ms.\n\n## As a reverse proxy\n\nServing files is half the job; the other half is proxying to a backend, which is the main reason most people reach for nginx or Caddy in the first place. zeroserve does it from a script - `zs_reverse_proxy(\"http://127.0.0.1:9000\")`\n\n- and keeps a pool of upstream connections (up to 128 per backend, 30 s idle) and reuses them across requests.\n\nGetting a fair fight here takes care: nginx's famous default closes upstream connections after each request, so keep-alive is enabled explicitly (`keepalive 128`\n\n, `proxy_http_version 1.1`\n\n, and a cleared `Connection`\n\nheader), with Caddy reusing connections as it does by default. Each proxy terminates TLS on a single core and forwards to a shared plaintext backend, a separate 2-core server that sustains 100k req/s on its own, so the measurement isolates the proxy's own overhead.\n\nProxying a small (174 B) response:\n\n| proxy | req/s | p50 | p99 |\n|---|---|---|---|\n| zeroserve | 26,486 |\n3.3 ms | 8 ms |\n| nginx | 21,761 | 4.2 ms | 10.5 ms |\n| Caddy | 7,683 | 10.3 ms | 33 ms |\n\nzeroserve's pooled `io_uring`\n\nproxy leads here, about 22% ahead of nginx (26.5k against 21.8k) and roughly 3.4× Caddy. For the typical proxy workload - forwarding API calls, small JSON, an app server's HTML - zeroserve terminates TLS and shuttles the request to the backend faster than the reference implementation.\n\nLarge bodies tip the balance back. Proxying a 100 KB response:\n\n| proxy | req/s | throughput |\n|---|---|---|\n| nginx | 5,882 |\n585 MB/s |\n| Caddy | 4,285 | 406 MB/s |\n| zeroserve | 3,631 | 359 MB/s |\n\nOnce the proxied body is large, nginx's buffering moves bytes more efficiently and pulls ahead, with Caddy slotting in between and zeroserve trailing. If your proxied responses are large, nginx is the better tool; if they're small and numerous, zeroserve is faster.\n\n## Memory\n\nIdle, a single zeroserve instance sits around 15 MB PSS - more than nginx's ~6 MB, less than Caddy's ~60 MB. On its own that's unremarkable. What makes it matter is that the unit is a whole process: when you run a copy per core, they all map the same binary, so the code pages are shared, and each extra process adds little beyond its own working set.\n\nzeroserve is [open source on GitHub](https://github.com/losfair/zeroserve) - try it yourself!", "url": "https://wpnews.pro/news/zeroserve-a-zero-config-web-server-you-can-script-with-ebpf", "canonical_source": "https://su3.io/posts/introducing-zeroserve", "published_at": "2026-06-06 14:59:43+00:00", "updated_at": "2026-06-06 15:18:38.300755+00:00", "lang": "en", "topics": ["ai-infrastructure", "ai-tools", "ai-products"], "entities": ["Zeroserve", "nginx", "Caddy", "eBPF", "io_uring", "TLS 1.3", "HTTP/2", "GPT-5.5"], "alternates": {"html": "https://wpnews.pro/news/zeroserve-a-zero-config-web-server-you-can-script-with-ebpf", "markdown": "https://wpnews.pro/news/zeroserve-a-zero-config-web-server-you-can-script-with-ebpf.md", "text": "https://wpnews.pro/news/zeroserve-a-zero-config-web-server-you-can-script-with-ebpf.txt", "jsonld": "https://wpnews.pro/news/zeroserve-a-zero-config-web-server-you-can-script-with-ebpf.jsonld"}}