cd /news/ai-tools/show-hn-nenya-a-lightweight-highly-s… · home topics ai-tools article
[ARTICLE · art-25469] src=github.com pub= topic=ai-tools verified=true sentiment=↑ positive

Show HN: Nenya – A lightweight, highly secure AI API Gateway/Proxy written in Go

A new open-source AI API gateway called Nenya, written in Go, provides a lightweight, zero-dependency proxy that sits between AI coding clients and upstream large language model providers. The gateway adds security features including secret redaction, context management, agent routing, and MCP tool integration with transparent SSE streaming, while enforcing non-root execution, mlock for secrets, and seccomp with no-new-privileges. Nenya supports any provider implementing OpenAI or Anthropic Chat Completions APIs, ships with 23 built-in adapters, and offers config-driven provider registration without code changes.

read7 min publishedJun 12, 2026

A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges.

Compatible with any provider that implements the OpenAI Or Anthropic Chat Completions API. For 23 providers we ship built-in adapters with specialized handling.

+----------------------------------------------+
| Client (Cursor / OpenCode / Aider / etc.)    |
| OpenAI-compatible request                    |
| POST /v1/chat/completions + Bearer token     |
| or                                           |
| Anthropic Messages API request               |
| POST /v1/messages + x-api-key                |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Nenya Gateway                                |
| - auth check + RBAC enforcement              |
| - parse JSON + extract model                 |
| - resolve agent/provider                     |
| - optional cache (HIT => replay SSE)         |
| - optional MCP context/tool injection        |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Interceptor Chain (pluggable, best-effort)   |
| - RedactInterceptor  (regex patterns)        |
| - EntropyInterceptor (high-entropy strings)  |
| - TFIDFInterceptor   (relevance scoring)     |
| - BouncerInterceptor (engine summarization)  |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Token Budget Trimming (if payload > hard     |
| limit) drops oldest non-system messages and  |
| applies token-aware middle-out truncation    |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Routing                                      |
|  A) Standard forwarding                      |
|     - fallback chain + circuit breaker + RL  |
|  B) MCP multi-turn tool loop (if enabled)    |
|     - buffer SSE, execute MCP tools, re-send |
|  C) Context-limit retry                      |
|     - on upstream 413/context_exceeded,      |
|       summarize payload, retry with fallback |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Upstream LLM Providers                       |
| Anthropic | Gemini | DeepSeek | Mistral | ...|
+----------------------------------------------+
                        |
                        |  SSE stream
                        v
+----------------------------------------------+
| Nenya SSE Pipeline                           |
| - adapter response transforms                |
| - (optional) OpenAI→Anthropic conversion     |
| - usage accounting + stream filter           |
| - flush + (optional) cache capture           |
| - (optional) MCP auto-save                   |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Client receives transparent SSE output       |
+----------------------------------------------+

Flow notes:

/v1/*

endpoints require client bearer auth;/healthz

,/statsz

,/metrics

do not.- Pipeline failures degrade gracefully and forward the request instead of returning a 500.

  • MCP-enabled agents can run local/remote tools without exposing MCP complexity to the client.

Config-driven provider registry— add providers via JSON, zero code changes** 23 built-in providerswith specialized adapters for wire format differences Dynamic model discovery**— fetches live model catalogs from providers at startup and on reload** Model registry**— reference models by string shorthand with automatic provider/context resolution** Multi-provider model resolution**— when a model exists in multiple providers, all are added to the agent's fallback chain** Three-tier model resolution**— config overrides > discovered models > static registry** Per-model wire format**— models from multi-format gateways (OpenCode Zen) auto-convert between OpenAI, Anthropic, and Gemini wire formats based on the model'sformat

attributeAgent fallback chains— round-robin or sequential with circuit breaker and automatic failover** Latency-aware routing**— auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd** Per-agent system prompts**— inline or file-based

Tier-0 regex secret filter— always-on redaction of AWS keys, GitHub tokens, passwords, etc.** 3-Tier content pipeline**— pluggable interceptor chain: regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization** Context window compaction**— sliding window summarization with configurable engine** Stale tool call pruning**— compact old assistant+tool response pairs to save tokens** Thought pruning**— strip reasoning blocks from assistant message history** Input validation**— strict body limits, JSON sanitization, header filtering** Graceful degradation**— never blocks requests due to engine or pipeline failures** Role-Based Access Control (RBAC)— per-API key roles (admin, user, read-only) with agent and endpoint restrictions Secure memory**— mlock-protected token storage, read-only sealing, core dump prevention

Secure memory (default): All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled** Non-root execution**— runs as UID 65532 with dropped capabilities** Memory protection**—LimitMEMLOCK=infinity

andLimitCORE=0

in systemdRead-only filesystem— immutable root + private/tmp

Seccomp + no-new-privileges— restricted syscalls, prevents privilege escalation** Zero-trust secrets**— loaded via systemd credentials or container mounts, never to disk** Socket activation**— seamless restarts with zero dropped connections

Zero external dependencies— Go standard library only** Hot reload**—systemctl reload nenya

for zero-downtime config changesCircuit breaker— per agent+provider+model with automatic failover, exponential backoff, and semantic error classification** Rate limiting**— per upstream host (RPM/TPM) with per-provider overrides** Response cache**— in-memory LRU with SHA-256 fingerprinting and optional semantic similarity search** Graceful shutdown**— 5s grace period for in-flight requests, MCP client cleanup** Context-limit auto-retry**— upstream context-length errors trigger summarization and retry** Local engine lifecycle**— pre-load and manage local Ollama models with LRU eviction** Structured errors**— all error responses includeerror_kind

field for programmatic diagnostics

Tool discovery— connect to MCP servers for automatic tool injection** Multi-turn execution**— intercept tool calls, execute against MCP servers, forward results** Auto-search**— pre-fetch relevant context from MCP servers before forwarding** Auto-save**— persist assistant responses to MCP memory servers

Create minimal config and secrets:

mkdir -p config secrets
cat > config/config.json << 'EOF'
{
  "server": { "listen_addr": ":8080" },
  "agents": {
    "default": {
      "strategy": "fallback",
      "models": ["gemini-2.5-flash"]
    }
  }
}
EOF

cat > secrets/provider_keys.json << 'EOF'
{
  "provider_keys": {
    "gemini": "AIza..."
  }
}
EOF

cat > secrets/client.json << 'EOF'
{
  "client_token": "nk-$(openssl rand -hex 32)"
}
EOF

Run the container:

podman run -d \
  --name nenya \
  -p 8080:8080 \
  -v ./config:/etc/nenya:ro \
  -v ./secrets:/run/secrets/nenya:ro \
  -e NENYA_SECRETS_DIR=/run/secrets/nenya \
  --cap-drop=ALL \
  --cap-add=IPC_LOCK \
  --security-opt=no-new-privileges:true \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64M \
  ghcr.io/gumieri/nenya:latest

Test it:

curl -H "Authorization: Bearer $(jq -r '.client_token' secrets/client.json)" \
  http://localhost:8080/healthz

Nenya provides native packages for major Linux distributions and community package managers:

Distribution Command
Debian/Ubuntu (.deb)
Download nenya_<version>_linux_amd64.deb from the release page and run sudo dpkg -i
Fedora/RHEL (.rpm)
Download nenya-<version>.x86_64.rpm from the release page and run sudo rpm -i
Arch Linux (.pkg.tar.zst)
Download nenya-<version>-x86_64.pkg.tar.zst from the release page and run sudo pacman -U
Arch Linux (AUR)
yay -S nenya-bin (or your preferred AUR helper)
Nix/NixOS
Add gumieri/nur-packages to your NUR registry and use nenya

All packages install the binary to /usr/bin/nenya

and include systemd service and socket units. After install, enable and start:

sudo systemctl enable --now nenya.socket
sudo systemctl enable --now nenya.service

Nenya supports standard environment variables for deployment portability:

Variable Default Description
PORT
8080
Listening port (overrides server.listen_addr )
HOST
Optional bind address (e.g. 127.0.0.1 ). Only used when combined with PORT
NENYA_CONFIG_DIR
/etc/nenya/
Configuration directory path
NENYA_CONFIG_FILE
Single config file path (takes precedence over NENYA_CONFIG_DIR )
NENYA_SECRETS_DIR
Secrets directory (overrides CREDENTIALS_DIRECTORY )

Example usage:

PORT=9090 HOST=127.0.0.1 ./nenya --config /path/to/config.json

Or in Docker:

docker run -e PORT=9090 -p 9090:9090 ghcr.io/gumieri/nenya:latest

— Direct binary install, socket activation, hot reloadDeploy Bare Metal (systemd)— compose.yml, image verification, security hardeningDeploy Container (Podman/Docker Compose)— Helm chart, ConfigMap/Secret, ingress setupDeploy Kubernetes (Helm)

All /v1/*

endpoints require Authorization: Bearer <client_token>

or Bearer <api_key_token>

. API keys support RBAC enforcement — agent scoping, endpoint allowlists, role-based permissions (admin bypasses all checks).

Endpoint Auth Description
POST /v1/chat/completions
Bearer + RBAC OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn
POST /v1/messages
Bearer + RBAC Anthropic Messages API with bidirectional format conversion
GET /v1/models
Bearer + RBAC Live model catalog from discovered providers + static registry (context window, max tokens)
POST /v1/embeddings
Bearer + RBAC Passthrough proxy
POST /v1/responses
Bearer + RBAC Passthrough proxy
POST /v1/images/generations
Bearer + RBAC Image generation (OpenAI-compatible)
POST /v1/audio/transcriptions
Bearer + RBAC Audio transcription (Whisper-compatible, multipart support)
POST /v1/audio/speech
Bearer + RBAC Text-to-speech synthesis (OpenAI-compatible)
POST /v1/moderations
Bearer + RBAC Content moderation (OpenAI-compatible)
POST /v1/rerank
Bearer + RBAC Re-ranking API (Cohere/Jina/Voyage-compatible)
POST /v1/a2a
Bearer + RBAC Agent-to-Agent protocol (Google A2A)
GET /v1/files
Bearer + RBAC File listing, upload, retrieval, deletion
POST /v1/batches
Bearer + RBAC Batch API operations
POST /proxy/{provider}/*
Bearer + RBAC Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming)
GET /healthz
None Engine health probe
GET /statsz
None Token usage, circuit breaker state, MCP server status
GET /metrics
None Prometheus-compatible metrics
GET /debug/pprof/*
Bearer Go profiling endpoints (disabled by default, see debug.pprof_enabled )

See docs/PASSTHROUGH_PROXY.md for detailed passthrough proxy usage.

Document Description

ConfigurationDeploy Bare MetalDeploy ContainerDeploy KubernetesPassthrough ProxyArchitectureMCP IntegrationAdaptersSecrets FormatSecurityApache 2.0. See LICENSE.

── more in #ai-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-nenya-a-ligh…] indexed:0 read:7min 2026-06-12 ·