{"slug": "show-hn-aquifer-a-control-plane-for-agentic-api-traffic", "title": "Show HN: Aquifer – a control plane for agentic API traffic", "summary": "Aquifer, a self-hosted API request queue, has been released to control the pace of inbound and outbound agentic traffic and prevent partial outages from cascading. The tool absorbs traffic bursts by durably queuing requests to SQLite and releasing them at a configurable rate, allowing backends and upstream APIs to dictate their own traffic pace. Aquifer supports both inbound protection for a user's own API and outbound rate limiting for external services like OpenAI and Stripe, with automatic backoff based on upstream response headers.", "body_md": "**Self-hosted API request queue. Controls the pace of inbound and outbound traffic so partial outages don't cascade.**\n\nAPIs get hit in bursts — by agents, schedulers, or high-volume clients. Your backend gets overwhelmed on inbound. Your app gets 429s on outbound. One slow dependency takes everything else down with it.\n\nAquifer absorbs the burst, queues requests durably to SQLite, and releases them at the rate you configure. Your backend decides the pace. The upstream decides the pace. Whoever needs to slow things down — wins.\n\n**Inbound — protect your API**\n\n```\nagents / clients  →  POST /jobs to Aquifer  →  your backend (at controlled RPS)\n```\n\nAgents hammering your API? Aquifer queues their requests and drains them to your backend at a pace it can handle. Your backend returns `X-Aquifer-Rps`\n\nheaders to signal how fast it wants traffic in real time.\n\n**Outbound — respect external APIs**\n\n```\nyour app  →  POST /jobs to Aquifer  →  OpenAI / Stripe / any API (at controlled RPS)\n```\n\nCalling a rate-limited upstream? Aquifer queues the calls and dispatches them at your configured rate. If the upstream signals a slowdown via headers, Aquifer backs off automatically.\n\nIn both cases — **the upstream response headers are the final say on pace.** Your config sets the ceiling. Headers can only reduce below it, never exceed it. When pressure clears, the rate recovers gradually back to your ceiling.\n\n- Client POSTs a job (target URL, method, headers, body, webhook URL) and moves on\n- Aquifer persists it to SQLite — survives crashes, re-dispatches on restart\n- A per-upstream worker dispatches at your configured RPS with jitter\n- On completion Aquifer POSTs your webhook with the response body and status\n- The upstream can adjust the rate live via\n`X-Aquifer-*`\n\nresponse headers\n\n**Binary**\n\n```\ngo install github.com/rjpruitt16/aquifer@latest\naquifer\n```\n\n**Docker**\n\n```\ndocker run -p 8080:8080 -v $(pwd)/data:/data \\\n  -e DB_PATH=/data/aquifer.db \\\n  ghcr.io/rjpruitt16/aquifer\n```\n\n**Fly.io**\n\n```\ngit clone https://github.com/rjpruitt16/aquifer\ncd aquifer\nflyctl launch --name my-aquifer --no-deploy\nflyctl volumes create aquifer_data --size 1 --region iad\nflyctl deploy\n```\n\nSet `CONFIG_PATH`\n\nto a YAML file to configure rate limits per upstream hostname:\n\n```\n# aquifer.yml — copy from aquifer.example.yml\ndefaults:\n  rps: 2\n  max_concurrent: 1\n\nupstreams:\n  api.openai.com:\n    rps: 10\n    max_concurrent: 3\n  api.stripe.com:\n    rps: 20\n    max_concurrent: 5\n  your-backend.internal:\n    rps: 50\n    max_concurrent: 10\n```\n\n| Env var | Default | Description |\n|---|---|---|\n`PORT` |\n`8080` |\nHTTP listen port |\n`DB_PATH` |\n`aquifer.db` |\nSQLite database path |\n`CONFIG_PATH` |\n(none) |\nPath to rate limit config YAML |\n\n```\n{\n  \"user_id\":        \"user-123\",\n  \"idempotent_key\": \"invoice-42-notify\",\n  \"url\":            \"https://api.openai.com/v1/chat/completions\",\n  \"method\":         \"POST\",\n  \"headers\":        { \"Authorization\": \"Bearer sk-...\" },\n  \"body\":           \"{\\\"model\\\":\\\"gpt-4o\\\",\\\"messages\\\":[...]}\",\n  \"webhook_url\":    \"https://yourapp.com/webhooks/aquifer\"\n}\n```\n\nIdempotent — duplicate `idempotent_key`\n\nper `user_id`\n\nreturns the existing job.\n\n**201** new job queued · **200 + \"duplicate\": true** already exists\n\n```\n{\n  \"job_id\":     \"a3f9...\",\n  \"status\":     \"queued | in_flight | completed | failed\",\n  \"url\":        \"https://api.openai.com/v1/chat/completions\",\n  \"method\":     \"POST\",\n  \"created_at\": 1715000000000\n}\n```\n\nServer-Sent Events stream for live job updates.\n\n```\nevent: queued\ndata: {\"job_id\":\"a3f9...\",\"status\":\"queued\"}\n\nevent: dispatching\ndata: {\"job_id\":\"a3f9...\"}\n\nevent: completed\ndata: {\"job_id\":\"a3f9...\",\"response_status\":200,\"body\":\"...\"}\n```\n\nOr `event: failed`\n\nwith `{\"job_id\":\"...\",\"reason\":\"...\"}`\n\n.\n\n**Position updates** — while the job waits in queue, a position event is broadcast every 2 seconds:\n\n```\nevent: position\ndata: {\"job_id\":\"a3f9...\",\"position\":4}\ncurl -N http://localhost:8080/jobs/<id>/stream\n```\n\nConnecting late is safe — you'll receive synthetic `queued`\n\nand `dispatching`\n\ncatchup events for states you missed.\n\n**The Aqueduct Protocol** — SSE is the live view. Webhook is the guaranteed delivery. Both always fire regardless of whether the stream was open. Think of it like a phone call with voicemail: stay on the line (SSE) for real-time updates, or hang up and the result goes to voicemail (webhook). You never lose the result.\n\n```\n{ \"status\": \"ok\" }\n```\n\n**Completed**\n\n```\n{\n  \"job_id\":          \"a3f9...\",\n  \"status\":          \"completed\",\n  \"response_status\": 200,\n  \"body\":            \"...\"\n}\n```\n\n**Failed** (after 4 retries with exponential backoff)\n\n```\n{\n  \"job_id\": \"a3f9...\",\n  \"status\": \"failed\",\n  \"reason\": \"connection refused\"\n}\n```\n\nWebhook delivery retries 4 times: 1 s · 2 s · 4 s · 8 s.\n\nTraditional webhook security requires sharing a secret between sender and receiver and storing it in a database on both sides. Aquifer implements **L8 v0.1**, a lightweight challenge-response protocol that eliminates shared secrets entirely.\n\n**The attack surface problem L8 solves:** A shared HMAC secret is something that can be stolen, accidentally logged, forgotten to rotate, or compromised on either side. A stolen secret lets anyone forge webhook deliveries forever. L8 replaces that shared secret with public key cryptography — there is no secret to steal from a database.\n\n**How it works:**\n\n- The receiver publishes a public key at\n`GET /.well-known/l8`\n\n- Before the first delivery, Aquifer challenges the receiver to prove ownership of the corresponding private key — a one-time handshake\n- Trust is cached to disk as\n`l8-trust/{domain}.json`\n\n— the handshake never runs again for that domain - Every webhook delivery carries\n`X-L8-Signature`\n\nheaders the receiver verifies locally with no database lookup and no round-trip to any authority\n\n**Why this keeps things fast:** Verification is a single local Ed25519 `verify()`\n\ncall against a cached public key. No database query, no HTTP call, no shared state. Microseconds.\n\n**Key management:**\n\nSet `L8_PRIVATE_KEY`\n\n(base64 Ed25519 private key) for a stable identity across restarts. Without it, Aquifer auto-generates a key and saves it to `.l8-key`\n\non first start.\n\nTo revoke trust with a domain: delete `l8-trust/{domain}.json`\n\n. The handshake re-runs on next delivery.\n\n**Aquifer exposes:**\n\n| Endpoint | Purpose |\n|---|---|\n`GET /.well-known/l8` |\nAquifer's public key and capabilities — receivers discover Aquifer here |\n`POST /l8/challenge` |\nHandles incoming challenges from receivers verifying Aquifer's identity |\n`GET /l8-spec` |\nThe full L8 protocol spec — served on any running Aquifer instance |\n\n**Protocol version:** `0.1`\n\n. The version is advertised in `/.well-known/l8`\n\nand `GET /health`\n\nso agents can detect what capabilities are available. Future versions will add payload encryption (0.2) and formalized key rotation (0.3).\n\nThe full protocol spec and verification examples are in [L8-SPEC.md](/rjpruitt16/aquifer/blob/main/L8-SPEC.md), also browsable at `GET /l8-spec`\n\non any running instance. The spec documents the receiver-side endpoints any service needs to implement to receive signed webhooks.\n\nSee `tests/l8_receiver.py`\n\nfor a complete reference implementation of the receiver side, and `tests/test_l8.py`\n\nfor end-to-end tests that verify the handshake, signed delivery, and cryptographic signature validation.\n\nThe upstream controls pace at runtime via response headers:\n\n| Header | Effect |\n|---|---|\n`X-Aquifer-Rps` |\nReduce dispatch rate to this value |\n`X-Aquifer-Max-Concurrent` |\nReduce max in-flight requests |\n`X-Aquifer-Account-Queue` |\n`enabled` — isolate each tenant's queue |\n\nWith `X-Aquifer-Account-Queue: enabled`\n\n, each `(user_id, api_key)`\n\npair gets its own independently paced queue. One tenant's burst can't slow down another.\n\nAquifer sends machine load data as headers on every outgoing request to your service:\n\n| Header | Value |\n|---|---|\n`X-Aquifer-Total-Jobs` |\nTotal jobs on this machine right now |\n`X-Aquifer-Queue-Depth` |\nJobs waiting to be dispatched |\n`X-Aquifer-Flow-Rate` |\nCurrent dispatch rate (RPS) for this queue |\n\nYour service reads these headers and calls your autoscaler when the queue is growing:\n\n```\ntotal_jobs = int(request.headers.get(\"X-Aquifer-Total-Jobs\", 0))\n\nif total_jobs > 500:\n    scale_up()  # call Fly.io, AWS ASG, k8s HPA, etc.\n```\n\nThis keeps the autoscaling decision in your hands — Aquifer exposes the signal, your service acts on it however fits your infrastructure.\n\n**Durable queue**— jobs persist to SQLite on every write** Crash recovery**— queued jobs re-dispatched automatically on restart** In-flight tracking**— jobs marked`in_flight`\n\nbefore dispatch; recovered immediately on panic without waiting for full restart**Stale job safety net**— in-flight jobs older than 5 min automatically reset to`queued`\n\n**Per-job panic isolation**— a panic in one job marks it failed and delivers the webhook; the worker keeps running\n\n| Status | TTL |\n|---|---|\n`queued` |\n24 h |\n`completed` |\n30 min |\n`failed` |\n2 h |\n\nAquifer is designed as a **sidecar on a single machine**. One instance per app server, SQLite on a local persistent volume — no external database, no coordination overhead.\n\nRunning multiple instances against the same upstream without partitioning will multiply your request rate. If you scale horizontally, partition by upstream domain or tenant so each instance owns a distinct key space.\n\nMIT", "url": "https://wpnews.pro/news/show-hn-aquifer-a-control-plane-for-agentic-api-traffic", "canonical_source": "https://github.com/rjpruitt16/aquifer", "published_at": "2026-05-25 19:23:02+00:00", "updated_at": "2026-05-25 19:37:53.285998+00:00", "lang": "en", "topics": ["ai-infrastructure", "ai-tools", "ai-agents", "mlops"], "entities": ["Aquifer", "OpenAI", "Stripe", "SQLite"], "alternates": {"html": "https://wpnews.pro/news/show-hn-aquifer-a-control-plane-for-agentic-api-traffic", "markdown": "https://wpnews.pro/news/show-hn-aquifer-a-control-plane-for-agentic-api-traffic.md", "text": "https://wpnews.pro/news/show-hn-aquifer-a-control-plane-for-agentic-api-traffic.txt", "jsonld": "https://wpnews.pro/news/show-hn-aquifer-a-control-plane-for-agentic-api-traffic.jsonld"}}