# Show HN: Nenya – A lightweight, highly secure AI API Gateway/Proxy written in Go

> Source: <https://github.com/gumieri/nenya>
> Published: 2026-06-12 17:32:40+00:00

A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges.

**Compatible with any provider that implements the OpenAI Or Anthropic Chat Completions API.** For 23 providers we ship built-in adapters with specialized handling.

```
+----------------------------------------------+
| Client (Cursor / OpenCode / Aider / etc.)    |
| OpenAI-compatible request                    |
| POST /v1/chat/completions + Bearer token     |
| or                                           |
| Anthropic Messages API request               |
| POST /v1/messages + x-api-key                |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Nenya Gateway                                |
| - auth check + RBAC enforcement              |
| - parse JSON + extract model                 |
| - resolve agent/provider                     |
| - optional cache (HIT => replay SSE)         |
| - optional MCP context/tool injection        |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Interceptor Chain (pluggable, best-effort)   |
| - RedactInterceptor  (regex patterns)        |
| - EntropyInterceptor (high-entropy strings)  |
| - TFIDFInterceptor   (relevance scoring)     |
| - BouncerInterceptor (engine summarization)  |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Token Budget Trimming (if payload > hard     |
| limit) drops oldest non-system messages and  |
| applies token-aware middle-out truncation    |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Routing                                      |
|  A) Standard forwarding                      |
|     - fallback chain + circuit breaker + RL  |
|  B) MCP multi-turn tool loop (if enabled)    |
|     - buffer SSE, execute MCP tools, re-send |
|  C) Context-limit retry                      |
|     - on upstream 413/context_exceeded,      |
|       summarize payload, retry with fallback |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Upstream LLM Providers                       |
| Anthropic | Gemini | DeepSeek | Mistral | ...|
+----------------------------------------------+
                        |
                        |  SSE stream
                        v
+----------------------------------------------+
| Nenya SSE Pipeline                           |
| - adapter response transforms                |
| - (optional) OpenAI→Anthropic conversion     |
| - usage accounting + stream filter           |
| - flush + (optional) cache capture           |
| - (optional) MCP auto-save                   |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Client receives transparent SSE output       |
+----------------------------------------------+
```

Flow notes:

`/v1/*`

endpoints require client bearer auth;`/healthz`

,`/statsz`

,`/metrics`

do not.- Pipeline failures degrade gracefully and forward the request instead of returning a 500.
- MCP-enabled agents can run local/remote tools without exposing MCP complexity to the client.

**Config-driven provider registry**— add providers via JSON, zero code changes** 23 built-in providers**with specialized adapters for wire format differences** Dynamic model discovery**— fetches live model catalogs from providers at startup and on reload** Model registry**— reference models by string shorthand with automatic provider/context resolution** Multi-provider model resolution**— when a model exists in multiple providers, all are added to the agent's fallback chain** Three-tier model resolution**— config overrides > discovered models > static registry** Per-model wire format**— models from multi-format gateways (OpenCode Zen) auto-convert between OpenAI, Anthropic, and Gemini wire formats based on the model's`format`

attribute**Agent fallback chains**— round-robin or sequential with circuit breaker and automatic failover** Latency-aware routing**— auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd** Per-agent system prompts**— inline or file-based

**Tier-0 regex secret filter**— always-on redaction of AWS keys, GitHub tokens, passwords, etc.** 3-Tier content pipeline**— pluggable interceptor chain: regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization** Context window compaction**— sliding window summarization with configurable engine** Stale tool call pruning**— compact old assistant+tool response pairs to save tokens** Thought pruning**— strip reasoning blocks from assistant message history** Input validation**— strict body limits, JSON sanitization, header filtering** Graceful degradation**— never blocks requests due to engine or pipeline failures** Role-Based Access Control (RBAC)**— per-API key roles (admin, user, read-only) with agent and endpoint restrictions** Secure memory**— mlock-protected token storage, read-only sealing, core dump prevention

**Secure memory (default)**: All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled** Non-root execution**— runs as UID 65532 with dropped capabilities** Memory protection**—`LimitMEMLOCK=infinity`

and`LimitCORE=0`

in systemd**Read-only filesystem**— immutable root + private`/tmp`

**Seccomp + no-new-privileges**— restricted syscalls, prevents privilege escalation** Zero-trust secrets**— loaded via systemd credentials or container mounts, never to disk** Socket activation**— seamless restarts with zero dropped connections

**Zero external dependencies**— Go standard library only** Hot reload**—`systemctl reload nenya`

for zero-downtime config changes**Circuit breaker**— per agent+provider+model with automatic failover, exponential backoff, and semantic error classification** Rate limiting**— per upstream host (RPM/TPM) with per-provider overrides** Response cache**— in-memory LRU with SHA-256 fingerprinting and optional semantic similarity search** Graceful shutdown**— 5s grace period for in-flight requests, MCP client cleanup** Context-limit auto-retry**— upstream context-length errors trigger summarization and retry** Local engine lifecycle**— pre-load and manage local Ollama models with LRU eviction** Structured errors**— all error responses include`error_kind`

field for programmatic diagnostics

**Tool discovery**— connect to MCP servers for automatic tool injection** Multi-turn execution**— intercept tool calls, execute against MCP servers, forward results** Auto-search**— pre-fetch relevant context from MCP servers before forwarding** Auto-save**— persist assistant responses to MCP memory servers

Create minimal config and secrets:

```
mkdir -p config secrets
cat > config/config.json << 'EOF'
{
  "server": { "listen_addr": ":8080" },
  "agents": {
    "default": {
      "strategy": "fallback",
      "models": ["gemini-2.5-flash"]
    }
  }
}
EOF

cat > secrets/provider_keys.json << 'EOF'
{
  "provider_keys": {
    "gemini": "AIza..."
  }
}
EOF

cat > secrets/client.json << 'EOF'
{
  "client_token": "nk-$(openssl rand -hex 32)"
}
EOF
```

Run the container:

```
podman run -d \
  --name nenya \
  -p 8080:8080 \
  -v ./config:/etc/nenya:ro \
  -v ./secrets:/run/secrets/nenya:ro \
  -e NENYA_SECRETS_DIR=/run/secrets/nenya \
  --cap-drop=ALL \
  --cap-add=IPC_LOCK \
  --security-opt=no-new-privileges:true \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64M \
  ghcr.io/gumieri/nenya:latest
```

Test it:

```
curl -H "Authorization: Bearer $(jq -r '.client_token' secrets/client.json)" \
  http://localhost:8080/healthz
```

Nenya provides native packages for major Linux distributions and community package managers:

| Distribution | Command |
|---|---|
Debian/Ubuntu (.deb) |
Download `nenya_<version>_linux_amd64.deb` from the release page and run `sudo dpkg -i` |
Fedora/RHEL (.rpm) |
Download `nenya-<version>.x86_64.rpm` from the release page and run `sudo rpm -i` |
Arch Linux (.pkg.tar.zst) |
Download `nenya-<version>-x86_64.pkg.tar.zst` from the release page and run `sudo pacman -U` |
Arch Linux (AUR) |
`yay -S nenya-bin` (or your preferred AUR helper) |
Nix/NixOS |
Add `gumieri/nur-packages` to your NUR registry and use `nenya` |

All packages install the binary to `/usr/bin/nenya`

and include systemd service and socket units. After install, enable and start:

```
sudo systemctl enable --now nenya.socket
sudo systemctl enable --now nenya.service
```

Nenya supports standard environment variables for deployment portability:

| Variable | Default | Description |
|---|---|---|
`PORT` |
`8080` |
Listening port (overrides `server.listen_addr` ) |
`HOST` |
— | Optional bind address (e.g. `127.0.0.1` ). Only used when combined with `PORT` |
`NENYA_CONFIG_DIR` |
`/etc/nenya/` |
Configuration directory path |
`NENYA_CONFIG_FILE` |
— | Single config file path (takes precedence over `NENYA_CONFIG_DIR` ) |
`NENYA_SECRETS_DIR` |
— | Secrets directory (overrides `CREDENTIALS_DIRECTORY` ) |

Example usage:

```
PORT=9090 HOST=127.0.0.1 ./nenya --config /path/to/config.json
```

Or in Docker:

```
docker run -e PORT=9090 -p 9090:9090 ghcr.io/gumieri/nenya:latest
```

— Direct binary install, socket activation, hot reload[Deploy Bare Metal (systemd)](/gumieri/nenya/blob/main/docs/DEPLOY_BAREMETAL.md)— compose.yml, image verification, security hardening[Deploy Container (Podman/Docker Compose)](/gumieri/nenya/blob/main/docs/DEPLOY_CONTAINER.md)— Helm chart, ConfigMap/Secret, ingress setup[Deploy Kubernetes (Helm)](/gumieri/nenya/blob/main/docs/DEPLOY_KUBERNETES.md)

All `/v1/*`

endpoints require `Authorization: Bearer <client_token>`

or `Bearer <api_key_token>`

.
API keys support **RBAC enforcement** — agent scoping, endpoint allowlists, role-based permissions (admin bypasses all checks).

| Endpoint | Auth | Description |
|---|---|---|
`POST /v1/chat/completions` |
Bearer + RBAC | OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn |
`POST /v1/messages` |
Bearer + RBAC | Anthropic Messages API with bidirectional format conversion |
`GET /v1/models` |
Bearer + RBAC | Live model catalog from discovered providers + static registry (context window, max tokens) |
`POST /v1/embeddings` |
Bearer + RBAC | Passthrough proxy |
`POST /v1/responses` |
Bearer + RBAC | Passthrough proxy |
`POST /v1/images/generations` |
Bearer + RBAC | Image generation (OpenAI-compatible) |
`POST /v1/audio/transcriptions` |
Bearer + RBAC | Audio transcription (Whisper-compatible, multipart support) |
`POST /v1/audio/speech` |
Bearer + RBAC | Text-to-speech synthesis (OpenAI-compatible) |
`POST /v1/moderations` |
Bearer + RBAC | Content moderation (OpenAI-compatible) |
`POST /v1/rerank` |
Bearer + RBAC | Re-ranking API (Cohere/Jina/Voyage-compatible) |
`POST /v1/a2a` |
Bearer + RBAC | Agent-to-Agent protocol (Google A2A) |
`GET /v1/files` |
Bearer + RBAC | File listing, upload, retrieval, deletion |
`POST /v1/batches` |
Bearer + RBAC | Batch API operations |
`POST /proxy/{provider}/*` |
Bearer + RBAC | Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming) |
`GET /healthz` |
None | Engine health probe |
`GET /statsz` |
None | Token usage, circuit breaker state, MCP server status |
`GET /metrics` |
None | Prometheus-compatible metrics |
`GET /debug/pprof/*` |
Bearer | Go profiling endpoints (disabled by default, see `debug.pprof_enabled` ) |

See [ docs/PASSTHROUGH_PROXY.md](/gumieri/nenya/blob/main/docs/PASSTHROUGH_PROXY.md) for detailed passthrough proxy usage.

| Document | Description |
|---|---|
|

[Configuration](/gumieri/nenya/blob/main/docs/CONFIGURATION.md)[Deploy Bare Metal](/gumieri/nenya/blob/main/docs/DEPLOY_BAREMETAL.md)[Deploy Container](/gumieri/nenya/blob/main/docs/DEPLOY_CONTAINER.md)[Deploy Kubernetes](/gumieri/nenya/blob/main/docs/DEPLOY_KUBERNETES.md)[Passthrough Proxy](/gumieri/nenya/blob/main/docs/PASSTHROUGH_PROXY.md)[Architecture](/gumieri/nenya/blob/main/docs/ARCHITECTURE.md)[MCP Integration](/gumieri/nenya/blob/main/docs/MCP_INTEGRATION.md)[Adapters](/gumieri/nenya/blob/main/docs/ADAPTERS.md)[Secrets Format](/gumieri/nenya/blob/main/docs/SECRETS_FORMAT.md)[Security](/gumieri/nenya/blob/main/docs/SECURITY.md)Apache 2.0. See [ LICENSE](/gumieri/nenya/blob/main/LICENSE).
