Show HN: Nenya – A lightweight, highly secure AI API Gateway/Proxy written in Go

A new open-source AI API gateway called Nenya, written in Go, provides a lightweight, zero-dependency proxy that sits between AI coding clients and upstream large language model providers. The gateway adds security features including secret redaction, context management, agent routing, and MCP tool integration with transparent SSE streaming, while enforcing non-root execution, mlock for secrets, and seccomp with no-new-privileges. Nenya supports any provider implementing OpenAI or Anthropic Chat Completions APIs, ships with 23 built-in adapters, and offers config-driven provider registration without code changes.

A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges. Compatible with any provider that implements the OpenAI Or Anthropic Chat Completions API. For 23 providers we ship built-in adapters with specialized handling. +----------------------------------------------+ | Client Cursor / OpenCode / Aider / etc. | | OpenAI-compatible request | | POST /v1/chat/completions + Bearer token | | or | | Anthropic Messages API request | | POST /v1/messages + x-api-key | +----------------------------------------------+ | v +----------------------------------------------+ | Nenya Gateway | | - auth check + RBAC enforcement | | - parse JSON + extract model | | - resolve agent/provider | | - optional cache HIT = replay SSE | | - optional MCP context/tool injection | +----------------------------------------------+ | v +----------------------------------------------+ | Interceptor Chain pluggable, best-effort | | - RedactInterceptor regex patterns | | - EntropyInterceptor high-entropy strings | | - TFIDFInterceptor relevance scoring | | - BouncerInterceptor engine summarization | +----------------------------------------------+ | v +----------------------------------------------+ | Token Budget Trimming if payload hard | | limit drops oldest non-system messages and | | applies token-aware middle-out truncation | +----------------------------------------------+ | v +----------------------------------------------+ | Routing | | A Standard forwarding | | - fallback chain + circuit breaker + RL | | B MCP multi-turn tool loop if enabled | | - buffer SSE, execute MCP tools, re-send | | C Context-limit retry | | - on upstream 413/context exceeded, | | summarize payload, retry with fallback | +----------------------------------------------+ | v +----------------------------------------------+ | Upstream LLM Providers | | Anthropic | Gemini | DeepSeek | Mistral | ...| +----------------------------------------------+ | | SSE stream v +----------------------------------------------+ | Nenya SSE Pipeline | | - adapter response transforms | | - optional OpenAI→Anthropic conversion | | - usage accounting + stream filter | | - flush + optional cache capture | | - optional MCP auto-save | +----------------------------------------------+ | v +----------------------------------------------+ | Client receives transparent SSE output | +----------------------------------------------+ Flow notes: /v1/ endpoints require client bearer auth; /healthz , /statsz , /metrics do not.- Pipeline failures degrade gracefully and forward the request instead of returning a 500. - MCP-enabled agents can run local/remote tools without exposing MCP complexity to the client. Config-driven provider registry — add providers via JSON, zero code changes 23 built-in providers with specialized adapters for wire format differences Dynamic model discovery — fetches live model catalogs from providers at startup and on reload Model registry — reference models by string shorthand with automatic provider/context resolution Multi-provider model resolution — when a model exists in multiple providers, all are added to the agent's fallback chain Three-tier model resolution — config overrides discovered models static registry Per-model wire format — models from multi-format gateways OpenCode Zen auto-convert between OpenAI, Anthropic, and Gemini wire formats based on the model's format attribute Agent fallback chains — round-robin or sequential with circuit breaker and automatic failover Latency-aware routing — auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd Per-agent system prompts — inline or file-based Tier-0 regex secret filter — always-on redaction of AWS keys, GitHub tokens, passwords, etc. 3-Tier content pipeline — pluggable interceptor chain: regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization Context window compaction — sliding window summarization with configurable engine Stale tool call pruning — compact old assistant+tool response pairs to save tokens Thought pruning — strip reasoning blocks from assistant message history Input validation — strict body limits, JSON sanitization, header filtering Graceful degradation — never blocks requests due to engine or pipeline failures Role-Based Access Control RBAC — per-API key roles admin, user, read-only with agent and endpoint restrictions Secure memory — mlock-protected token storage, read-only sealing, core dump prevention Secure memory default : All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled Non-root execution — runs as UID 65532 with dropped capabilities Memory protection — LimitMEMLOCK=infinity and LimitCORE=0 in systemd Read-only filesystem — immutable root + private /tmp Seccomp + no-new-privileges — restricted syscalls, prevents privilege escalation Zero-trust secrets — loaded via systemd credentials or container mounts, never to disk Socket activation — seamless restarts with zero dropped connections Zero external dependencies — Go standard library only Hot reload — systemctl reload nenya for zero-downtime config changes Circuit breaker — per agent+provider+model with automatic failover, exponential backoff, and semantic error classification Rate limiting — per upstream host RPM/TPM with per-provider overrides Response cache — in-memory LRU with SHA-256 fingerprinting and optional semantic similarity search Graceful shutdown — 5s grace period for in-flight requests, MCP client cleanup Context-limit auto-retry — upstream context-length errors trigger summarization and retry Local engine lifecycle — pre-load and manage local Ollama models with LRU eviction Structured errors — all error responses include error kind field for programmatic diagnostics Tool discovery — connect to MCP servers for automatic tool injection Multi-turn execution — intercept tool calls, execute against MCP servers, forward results Auto-search — pre-fetch relevant context from MCP servers before forwarding Auto-save — persist assistant responses to MCP memory servers Create minimal config and secrets: mkdir -p config secrets cat config/config.json << 'EOF' { "server": { "listen addr": ":8080" }, "agents": { "default": { "strategy": "fallback", "models": "gemini-2.5-flash" } } } EOF cat secrets/provider keys.json << 'EOF' { "provider keys": { "gemini": "AIza..." } } EOF cat secrets/client.json << 'EOF' { "client token": "nk-$ openssl rand -hex 32 " } EOF Run the container: podman run -d \ --name nenya \ -p 8080:8080 \ -v ./config:/etc/nenya:ro \ -v ./secrets:/run/secrets/nenya:ro \ -e NENYA SECRETS DIR=/run/secrets/nenya \ --cap-drop=ALL \ --cap-add=IPC LOCK \ --security-opt=no-new-privileges:true \ --read-only \ --tmpfs /tmp:rw,noexec,nosuid,size=64M \ ghcr.io/gumieri/nenya:latest Test it: curl -H "Authorization: Bearer $ jq -r '.client token' secrets/client.json " \ http://localhost:8080/healthz Nenya provides native packages for major Linux distributions and community package managers: | Distribution | Command | |---|---| Debian/Ubuntu .deb | Download nenya <version linux amd64.deb from the release page and run sudo dpkg -i | Fedora/RHEL .rpm | Download nenya-<version .x86 64.rpm from the release page and run sudo rpm -i | Arch Linux .pkg.tar.zst | Download nenya-<version -x86 64.pkg.tar.zst from the release page and run sudo pacman -U | Arch Linux AUR | yay -S nenya-bin or your preferred AUR helper | Nix/NixOS | Add gumieri/nur-packages to your NUR registry and use nenya | All packages install the binary to /usr/bin/nenya and include systemd service and socket units. After install, enable and start: sudo systemctl enable --now nenya.socket sudo systemctl enable --now nenya.service Nenya supports standard environment variables for deployment portability: | Variable | Default | Description | |---|---|---| PORT | 8080 | Listening port overrides server.listen addr | HOST | — | Optional bind address e.g. 127.0.0.1 . Only used when combined with PORT | NENYA CONFIG DIR | /etc/nenya/ | Configuration directory path | NENYA CONFIG FILE | — | Single config file path takes precedence over NENYA CONFIG DIR | NENYA SECRETS DIR | — | Secrets directory overrides CREDENTIALS DIRECTORY | Example usage: PORT=9090 HOST=127.0.0.1 ./nenya --config /path/to/config.json Or in Docker: docker run -e PORT=9090 -p 9090:9090 ghcr.io/gumieri/nenya:latest — Direct binary install, socket activation, hot reload Deploy Bare Metal systemd /gumieri/nenya/blob/main/docs/DEPLOY BAREMETAL.md — compose.yml, image verification, security hardening Deploy Container Podman/Docker Compose /gumieri/nenya/blob/main/docs/DEPLOY CONTAINER.md — Helm chart, ConfigMap/Secret, ingress setup Deploy Kubernetes Helm /gumieri/nenya/blob/main/docs/DEPLOY KUBERNETES.md All /v1/ endpoints require Authorization: Bearer <client token or Bearer <api key token . API keys support RBAC enforcement — agent scoping, endpoint allowlists, role-based permissions admin bypasses all checks . | Endpoint | Auth | Description | |---|---|---| POST /v1/chat/completions | Bearer + RBAC | OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn | POST /v1/messages | Bearer + RBAC | Anthropic Messages API with bidirectional format conversion | GET /v1/models | Bearer + RBAC | Live model catalog from discovered providers + static registry context window, max tokens | POST /v1/embeddings | Bearer + RBAC | Passthrough proxy | POST /v1/responses | Bearer + RBAC | Passthrough proxy | POST /v1/images/generations | Bearer + RBAC | Image generation OpenAI-compatible | POST /v1/audio/transcriptions | Bearer + RBAC | Audio transcription Whisper-compatible, multipart support | POST /v1/audio/speech | Bearer + RBAC | Text-to-speech synthesis OpenAI-compatible | POST /v1/moderations | Bearer + RBAC | Content moderation OpenAI-compatible | POST /v1/rerank | Bearer + RBAC | Re-ranking API Cohere/Jina/Voyage-compatible | POST /v1/a2a | Bearer + RBAC | Agent-to-Agent protocol Google A2A | GET /v1/files | Bearer + RBAC | File listing, upload, retrieval, deletion | POST /v1/batches | Bearer + RBAC | Batch API operations | POST /proxy/{provider}/ | Bearer + RBAC | Arbitrary provider endpoint passthrough all HTTP methods, SSE streaming | GET /healthz | None | Engine health probe | GET /statsz | None | Token usage, circuit breaker state, MCP server status | GET /metrics | None | Prometheus-compatible metrics | GET /debug/pprof/ | Bearer | Go profiling endpoints disabled by default, see debug.pprof enabled | See docs/PASSTHROUGH PROXY.md /gumieri/nenya/blob/main/docs/PASSTHROUGH PROXY.md for detailed passthrough proxy usage. | Document | Description | |---|---| | Configuration /gumieri/nenya/blob/main/docs/CONFIGURATION.md Deploy Bare Metal /gumieri/nenya/blob/main/docs/DEPLOY BAREMETAL.md Deploy Container /gumieri/nenya/blob/main/docs/DEPLOY CONTAINER.md Deploy Kubernetes /gumieri/nenya/blob/main/docs/DEPLOY KUBERNETES.md Passthrough Proxy /gumieri/nenya/blob/main/docs/PASSTHROUGH PROXY.md Architecture /gumieri/nenya/blob/main/docs/ARCHITECTURE.md MCP Integration /gumieri/nenya/blob/main/docs/MCP INTEGRATION.md Adapters /gumieri/nenya/blob/main/docs/ADAPTERS.md Secrets Format /gumieri/nenya/blob/main/docs/SECRETS FORMAT.md Security /gumieri/nenya/blob/main/docs/SECURITY.md Apache 2.0. See LICENSE /gumieri/nenya/blob/main/LICENSE .