LLM Proxy Projects with Granular API Key Access Control: A Comprehensive Survey

A new research report analyzing 38 open-source and managed LLM proxy projects finds the market rapidly maturing from simple protocol translators into comprehensive governance platforms. LiteLLM leads with the most complete feature set including virtual keys per user and team, while newer entrants like Bifrost and TensorZero offer lower latency and enterprise-grade access controls in their open-source tiers. The landscape is bifurcating between self-hostable proxies and managed SaaS platforms, with implications for data sovereignty and compliance.

Post LLM Proxy Projects with Granular API Key Access Control: A Comprehensive Survey A research report examining 38+ open-source and managed LLM proxy/gateway projects that provide API key management and granular access control — from LiteLLM and Bifrost to TensorZero and TrueFoundry, comparing their architectures, features, and trade-offs. Executive Summary This report identifies and analyzes 38+ open-source and managed LLM proxy/gateway projects that provide API key management and granular access control capabilities. The market has matured rapidly from simple protocol-translating proxies into comprehensive governance platforms. The leading project, LiteLLM ~40K GitHub stars , offers the most complete feature set with virtual keys scoped per user, team, and budget — but faces performance limitations Python GIL and a supply-chain security breach in March 2026. Newer projects like Bifrost 5.6K stars, Go-based claim 54× lower latency and deliver enterprise governance features in their open-source tier without paywalling RBAC or SSO. A significant expansion from the prior survey includes several notable new entrants: TensorZero ~11.4K stars, Rust-based, <1ms P99 latency brings an ML-optimized gateway with tag-based rate limits; TrueFoundry offers enterprise-grade virtual accounts, RBAC, policy-as-code Cedar/OPA , and SOC 2/HIPAA/ITAR compliance with on-prem deployment; RelayPlane npm-native, Node.js provides a local-first cost-intelligence proxy with MCP support and zero network-hop overhead; OpenZiti LLM Gateway Go, Apache 2.0 delivers virtual API keys with model-level glob restrictions and zero-trust networking via zrok overlay. Additionally, managed-only platforms like Kilo Gateway 500+ models, BYOK , nexos.ai founded by Nord Security creators, $35M funding , and Braintrust Gateway organization-scoped and project-scoped API keys with AES-GCM caching are tracked separately from self-hostable projects. The landscape spans four architectural categories: Python libraries LiteLLM, Portkey, LM-Proxy , Go binaries Bifrost, Instawork llm-proxy, VoidLLM, OpenZiti , TypeScript services Helicone, LLM Gateway, OmniRoute , Rust gateways TensorZero, Helicone’s Rust rewrite , and enterprise-grade platforms built on Envoy WSO2, Kong . A significant trend is the emergence of MCP Model Context Protocol gateway capabilities as a new governance frontier — Bifrost, Portkey, VoidLLM, WSO2, RelayPlane, and TrueFoundry all now offer MCP tool management with access control. Critically, the market is bifurcating between self-hostable open-source proxies LiteLLM, Bifrost, TensorZero, VoidLLM, etc. and managed SaaS-only platforms OpenRouter, Kilo Gateway, nexos.ai, Cloudflare AI Gateway . This distinction has direct implications for data sovereignty, compliance, and operational burden — a theme explored throughout this report. Key Findings on API Key Granularity The deepest analysis in this report examines how granular access control actually is across projects: Per-model scoping : LiteLLM, Bifrost, TensorZero, OpenZiti, LLM Security Gateway, and TrueFoundry all support restricting keys to specific models or model patterns. IP allowlisting : Only Portkey geography/IP inbound rules and lazy-llm-proxy per-key IP allowlists provide this natively. Most projects rely on network-level controls VPC, firewall . Webhook-based validation : LM-Proxy Nayjest supports external HTTP service or Python function for custom key validation. Multi-tenancy isolation : Bifrost hierarchical org CRUD , TrueFoundry Virtual Accounts, per-provider RBAC , and LiteLLM team id with PostgreSQL-backed logical separation lead in multi-tenant depth. Physical/infra-level isolation requires self-hosting on-prem/VPC TrueFoundry, WSO2 . 1. Background and Context 1.1 What is an LLM Proxy? An LLM proxy also called an AI gateway, LLM router, or LLM middleware sits between client applications and LLM inference providers. Its core functions are: Protocol normalization : Translating between different provider API formats OpenAI’s /v1/chat/completions , Anthropic’s /v1/messages , Google’s Vertex AI, AWS Bedrock’s SigV4-signed requests into a single unified interface Access control : Managing which clients/teams/users can access which models and with what limits Cost governance : Tracking token usage, enforcing budgets, and providing cost attribution Reliability : Automatic failover, retry, load balancing, and circuit breaking Observability : Logging, metrics, tracing 1.2 Why API Key Granularity Matters In production environments, a single shared provider API key e.g., one OpenAI sk-... key used across all services creates several problems: No cost attribution : You cannot determine which team or service drove spending No isolation : A runaway script in one service burns the entire budget No audit trail : You cannot attribute a specific request to a user or application Security risk : If one key is compromised, all services are exposed No rate management : You cannot enforce per-service or per-user limits Virtual API keys solve this by providing a layer of indirection — the proxy accepts client-specific keys and maps them to upstream provider credentials internally. 1.3 Market Context: The LiteLLM Breach In March 2026, two significant events shook confidence in the LLM proxy ecosystem: - A supply-chain attack via compromised PyPI versions 1.82.7, 1.82.8 of LiteLLM that deployed a credential-stealing payload through a poisoned trivy-action GitHub Action 1 - The revelation that LiteLLM’s SOC 2 certification was based on fabricated compliance reports from Delve, a YC-backed startup later exposed for producing 500+ structurally identical audit reports 2 These events accelerated evaluation of alternatives with better security postures compiled languages, auditable builds and verified compliance credentials. 2. Detailed Project Analysis 2.1 LiteLLM BerriAI — The Incumbent | Attribute | Value | |---|---| | GitHub | github.com/BerriAI/litellm | | Stars | ~40,000 | | Language | Python | | License | MIT | | Self-hosted | Yes Docker, PyPI | | Providers | 100+ | API Key / Access Control Features: Virtual keys : Create verification tokens that act as client-facing API keys. Each key can be scoped to a specific user or team Budget management : Per-key personal budgets, per-team shared budgets, and per-team-member individual limits within a team’s shared budget 3 Rate limiting : Configurable RPM requests per minute and TPM tokens per minute per key Admin UI : Web dashboard for managing models, keys, teams, and budgets Team management : Keys can be assigned to teams with team id , enabling hierarchical budget structures Cost tracking : Automatic mapping of model-specific token pricing; cost data exposed at key, user, and team level 4 Guardrails : Input/output content filtering enterprise tier Load balancing : Distribute across multiple deployments of the same model Limitations: - Python GIL limits concurrency under high traffic - PostgreSQL-backed logging degrades after ~1M records - Enterprise features SSO, RBAC, team budgets gated behind paid license - Supply-chain vulnerability in March 2026 2.2 Bifrost Maxim AI — The High-Performance Challenger | Attribute | Value | |---|---| | GitHub | github.com/maximhq/bifrost | | Stars | ~5,600 | | Language | Go 75% , TypeScript 17% | | License | Apache 2.0 | | Self-hosted | Yes Docker, npx, Helm | | Providers | 23+ | API Key / Access Control Features: Virtual Keys : Create separate keys for different applications with independent budgets, rate limits, and access controls 5 Hierarchical Budgets : Four-tier hierarchy — Customer → Team → Virtual Key → Provider Config. Per-key rate limits, model restrictions, and spend caps enforced at the proxy layer SSO Integration : Google and GitHub SSO available in open-source version not paywalled RBAC : Role-based access control for admin, team, and user roles Vault Support : HashiCorp Vault integration for secure API key management OIDC User Provisioning : OAuth 2.0 / OIDC login with background directory sync for teams, roles, and business units 6 Per-provider budgets : Budgets scoped per virtual-key top level and per provider, wired from model configs Business unit CRUD : Full create/read/update/delete for organizational units Performance: - 11µs overhead at 5,000 RPS vs. LiteLLM’s ~8ms P95 at 1,000 RPS - 54× faster at P99 latency 2.3 Portkey AI Gateway | Attribute | Value | |---|---| | GitHub | github.com/Portkey-AI/gateway | | Stars | ~12,000 | | Language | TypeScript 96% | | License | MIT open-source gateway | | Self-hosted | Yes Docker, Node.js, Cloudflare Workers | | Providers | 1,600+ | API Key / Access Control Features: Virtual Keys : On-the-fly virtual key generation for secure key management Role-Based Access Control RBAC : Granular access control for users, workspaces, and API keys 7 Secure Key Vault : Store LLM provider keys in Portkey’s vault; manage access with virtual keys Secret References : Reference secrets stored in AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault — the gateway fetches credentials at runtime without storing them 8 MCP Gateway : Centralized control plane for MCP servers with authentication, access control per-team and per-user , identity forwarding email, team, roles , and observability Access Control & Inbound Rules : Control which IPs and geographies can connect to deployments PII Redaction : Automatically remove sensitive data from requests Enterprise Features: - SOC2, HIPAA, GDPR, CCPA compliance - Professional support with feature prioritization 2.4 Helicone | Attribute | Value | |---|---| | GitHub | github.com/Helicone/helicone | | Stars | ~5,800 | | Language | TypeScript 91% | | License | Apache 2.0 | | Self-hosted | Yes Docker, Helm | | Providers | 100+ | API Key / Access Control Features: AI Gateway : Single endpoint https://ai-gateway.helicone.ai accepting a Helicone API key, routing to 100+ models Unified API Key : One key provides access across all providers Self-hosted deployment : Docker Compose and Helm charts available Observability-first : Request logging, cost tracking, latency monitoring — the proxy is primarily an observability layer with light gateway features Limitations: - Lighter on routing and governance compared to full-featured gateways - Self-hosting noted as “not recommended” for manual deployment - Enterprise features compliance, governance remain thinner than enterprise-focused alternatives 2.5 VoidLLM | Attribute | Value | |---|---| | GitHub | github.com/voidmind-io/voidllm | | Stars | ~104 | | Language | Go 80% | | License | BSL 1.1 source-available | | Self-hosted | Yes Docker, Helm, binary | | Providers | OpenAI, Anthropic, Azure, Ollama, vLLM, custom | API Key / Access Control Features: Virtual Keys : Organization-wide access control with org/team/user scoping and RBAC 9 RBAC Hierarchy : Org → Team → User → Key hierarchy with 4 roles Rate Limits : Per-key, per-team, per-org request limits RPM/RPD , most-restrictive-wins across levels Token Budgets : Daily/monthly token budgets with real-time enforcement Usage Tracking : Tokens, cost, duration, TTFT time-to-first-token per request Model Aliases : Clients call default , proxy routes anywhere — decouples client code from provider MCP Gateway : Proxy external MCP servers with access control and session management; Code Mode WASM-sandboxed JS for multi-tool orchestration Zero-Knowledge Architecture : By design, never stores or logs prompt/response content. Only metadata who, what model, how many tokens is tracked Pricing Model: - Free tier: Core features - Pro $49/mo : Cost reports, usage export, extended retention - Enterprise $149/mo : SSO/OIDC, per-org SSO, auto-provisioning, audit logs, OpenTelemetry 2.6 OmniRoute | Attribute | Value | |---|---| | GitHub | github.com/diegosouzapw/OmniRoute | | Stars | ~5,700 | | Language | TypeScript 100% | | License | MIT | | Self-hosted | Yes Docker, npm | | Providers | 160+ | API Key / Access Control Features: Dedicated API Key Manager : /dashboard/api-manager page for managing API keys with create, delete, and permissions management 10 Encryption at Rest : Credentials encrypted with AES-256-GCM Authentication Methods : OAuth, API Key, or Web Cookie Free Providers : 50+ providers with free tiers aggregated Multi-modal APIs : Text, image, audio support Key Differentiator: - Completely free and open-source with no cloud dependency — “No OmniRoute cloud sits in the request path” 2.7 LLM-API-Key-Proxy Mirrowel | Attribute | Value | |---|---| | GitHub | github.com/Mirrowel/LLM-API-Key-Proxy | | Stars | ~507 | | Language | Python 100% | | License | MIT proxy + LGPL-3.0 resilience library | | Self-hosted | Yes Docker, binary, source | | Providers | Via LiteLLM fallback | API Key / Access Control Features: Single PROXY API KEY : One API key for all clients; configured via environment variable or TUI Multi-provider Key Rotation : Automatic rotation across multiple provider keys with intelligent cooldowns Usage Tracking : Per-provider usage statistics persisted to disk Quota Viewer : Alpha feature for viewing quota windows and fair-cycle status Credential Management : Interactive TUI for managing API keys and OAuth credentials Architecture: - Two components: FastAPI proxy application + standalone Python resilience library rotator library - The resilience library is independently usable for intelligent key selection, deadline-driven requests, and automatic failover 2.8 Instawork llm-proxy | Attribute | Value | |---|---| | GitHub | github.com/Instawork/llm-proxy | | Stars | ~31 | | Language | Go 96% | | License | MIT | | Self-hosted | Yes Docker, binary | | Providers | OpenAI, Anthropic, Gemini, AWS Bedrock | API Key / Access Control Features: Per-user/API Key Rate Limiting : Experimental feature for request/token-based limits per user/API key/model/provider Token Estimation : Provisional token estimation with post-response reconciliation using X-LLM-Input-Tokens Circuit Breaker : Per-key circuit breaker that classifies upstream failures, retries transient errors, and emits a degraded-signal response Per-provider Rollup : Detects wholesale outages across multiple keys for the same provider Bypass Safety Valve : Callers without fallback can opt out of fast-fail via header Key Differentiator: - Minimalist design — “without all the extra stuff you don’t need” - AWS Bedrock transparent SigV4 passthrough clients sign with their own credentials - Comprehensive circuit breaker with per-model keying and provider rollup 2.9 LLM Gateway theopenco | Attribute | Value | |---|---| | GitHub | github.com/theopenco/llmgateway | | Stars | ~1,300 | | Language | TypeScript 95% | | License | AGPLv3 open-source + Enterprise | | Self-hosted | Yes Docker, unified container | | Providers | 210+ | API Key / Access Control Features: API Key Management : Unified API interface with authentication Usage Analytics : Track requests, tokens used, response times, and costs Team and Organization Management : Enterprise features paid Custom Provider Key Configurations : Enterprise tier Architecture: - Monorepo with separate apps: UI Next.js , API Hono , Gateway routing , Playground, Admin - PostgreSQL + Redis for data persistence 2.10 llm-budget-proxy InkByteStudio | Attribute | Value | |---|---| | GitHub | github.com/InkByteStudio/llm-budget-proxy | | Stars | ~0 | | Language | TypeScript 89% | | License | MIT | | Self-hosted | Yes Docker | | Providers | OpenAI only MVP | API Key / Access Control Features: Per-Key Token Budgets : Daily and monthly USD budgets per API key Rate Limiting : RPM and TPM limits with overrides by key pattern Model Downgrade : Automatic downgrade to cheaper models when approaching budget thresholds opt-in Cost Dashboards : Single-page Chart.js dashboard showing cost by key, over time, and budget status Alert Webhooks : Slack/Discord webhook notifications at 80% warn , 95% downgrade , 100% block Response Headers : Every response includes X-Request-Cost , X-Estimated-Cost , X-Budget-Remaining , X-Budget-Warning Key Differentiator: - Deliberately simpler than LiteLLM: single SQLite database, single Docker container, ~5-minute setup vs. ~30 minutes for LiteLLM 2.11 LM-Proxy Nayjest | Attribute | Value | |---|---| | GitHub | github.com/Nayjest/lm-proxy | | Stars | ~134 | | Language | Python 99% | | License | MIT | | Self-hosted | Yes pip, source | | Providers | OpenAI, Anthropic, Google AI, local PyTorch | API Key / Access Control Features: Virtual API Key Management : Proxy-level keys separate from upstream provider keys User Groups : Configurable groups with api keys lists and allowed connections restrictions OIDC Integration : Validate tokens from OpenID Connect providers Keycloak, Auth0, Okta as virtual API keys Custom API Key Validation : Extensible validator functions for custom authentication logic Rate Limiter Handler : Sliding window rate limiting scoped per api key, ip, connection, group, or global Extensible Middleware : Before/request handlers for auditing, header forwarding, and custom logic Configuration: - TOML/YAML/JSON/Python config files api key check can reference a Python function or external HTTP service 2.12 LLM Security Gateway TerminalsandCoffee | Attribute | Value | |---|---| | GitHub | github.com/TerminalsandCoffee/llm-security-gateway | | Stars | ~1 | | Language | Python 93% | | License | Not specified | | Self-hosted | Yes Docker, AWS Lambda | | Providers | OpenAI, AWS Bedrock | API Key / Access Control Features: Per-Client Authentication : X-API-Key header with constant-time comparison hmac.compare digest Per-Client Configuration : JSON file or DynamoDB backend for per-client settings Rate Limiting : Sliding window counter, per-client RPM, returns X-RateLimit- headers Model Allowlist : Per-client model restrictions empty = all allowed AWS Lambda Deployment : Terraform-managed infrastructure with CI/CD Security Pipeline: - Authentication → 2. Rate Limiting → 3. Model Allowlist → 4. Injection Detection 20 patterns, 4 categories → 5. PII Detection SSN, CC, email, phone, IPv4 → 6. Forward → 7. Response Scan 2.13 Paperclip paperclipai | Attribute | Value | |---|---| | GitHub | github.com/paperclipai/paperclip | | Stars | ~69,700 | | Language | TypeScript 98% | | License | MIT | | Self-hosted | Yes npx, Docker | | Scope | AI agent orchestration not LLM proxy | API Key / Access Control Features: Agent API Keys : Short-lived run JWTs for agent execution Per-Agent Monthly Budgets : Token and cost tracking by company, agent, project, goal, issue, provider, and model Budget Hard Stops : Overspend pauses agents and cancels queued work Org Chart Governance : Board approval workflows, execution policies with review/approval stages Multi-Company Isolation : Complete data isolation between organizations Note: Paperclip is not an LLM proxy — it’s an orchestration/control plane for AI agents. It manages who works on what and how spend is capped, but delegates the actual LLM routing to underlying tools Claude Code, Codex, HTTP adapters . 2.14 WSO2 AI Gateway | Attribute | Value | |---|---| | GitHub | github.com/wso2/wso2-envoy-ai-gateway | | Language | Go 89% | | License | Apache 2.0 | | Self-hosted | Yes Docker, Kubernetes | | Providers | OpenAI, Anthropic, Google Vertex, Azure AI, AWS Bedrock, Mistral | API Key / Access Control Features: Token-Based Rate Limiting : Calibrated to how LLMs actually charge per-token, not per-request MCP Governance : Convert REST APIs into MCP-compatible servers, proxy external MCP servers with centralized policy enforcement PII Masking : Scrub sensitive data before prompts leave the network SOC 2 Type 2 + ISO 27001 : Verified compliance credentials Key Differentiator: - Built on Envoy Proxy — established Kubernetes-native deployment patterns - Unbundled adoption: can deploy just the AI Gateway without the full platform 2.15 Kong AI Gateway | Attribute | Value | |---|---| | Platform | Kong API Management Platform | | Language | Go core + Lua plugins | | License | Partial open-source core | | Self-hosted | Yes | | Providers | Via plugin architecture | API Key / Access Control Features: Token-Based Rate Limiting : Enterprise tier RBAC & Audit Logs : Enterprise API governance AI MCP Proxy Plugin : Dedicated MCP traffic governance OAuth2 Plugins : For MCP authentication Key Differentiator: - Existing enterprise API management penetration — natural adoption path for teams already running Kong 2.16 TensorZero — The ML-Optimized Rust Gateway | Attribute | Value | |---|---| | GitHub | github.com/tensorzero/tensorzero | | Stars | ~11,400 | | Language | Rust | | License | Apache 2.0 | | Self-hosted | Yes Docker, K8s/Helm examples | | Providers | 15+ direct, any OpenAI-compatible via extension | API Key / Access Control Features: Custom API Keys : Create and manage custom API keys for different clients or services tensorzero.com/docs/operations/set-up-auth-for-tensorzero Tag-based rate limits : Granular scopes — rate limits apply per tag e.g., per-project, per-team, per-environment Usage/cost tracking : Per-tag attribution for cost and usage analytics Structured inference with schema validation : Enforces input/output schemas, data used for downstream optimization GitOps orchestration : Prompts, models, parameters, tools, experiments managed via version-controlled config Performance: - <1ms P99 latency at 10,000+ QPS Rust - LiteLLM @ 100 QPS adds 25-100x more latency than TensorZero @ 10,000 QPS tensorzero.com/docs/gateway Differentiator: - Combines gateway + observability + evaluation + optimization + A/B testing in one platform - “Autopilot” feature: automated AI engineer that analyzes observability data, sets up evals, optimizes prompts/models, runs A/B tests - Team: includes Rust compiler maintainer, J.P. Morgan AI Research VP, Columbia postdoc 2.17 TrueFoundry AI Gateway — Enterprise Control Plane | Attribute | Value | |---|---| | URL | truefoundry.com/ai-gateway | | Stars | N/A enterprise SaaS + self-hosted | | Language | Go internal | | License | Proprietary self-hosted available | | Self-hosted | Yes SaaS, VPC, on-prem, air-gapped | | Providers | 250+ models | API Key / Access Control Features: Virtual Accounts VAT : Non-human production identity — gateway-managed keys that map to real provider credentials centrally truefoundry.com/blog/ai-governance-audit-enterprise-llm-gateway Personal Access Tokens PATs : For development workflows RBAC : Scoped per provider account; policy-as-code with Cedar and OPA engines at MCP-tool boundary Rate-limit & budget rules : Expressed as YAML with per-user/per-team/per-model/per-metadata scopes; first-match-wins evaluation Sliding-window enforcement : Twelve 5-second buckets summed across 60-second window; bursty but strict truefoundry.com/docs/ai-gateway/ratelimiting Audit-grade traces : Every request, rate-limit decision, guardrail outcome, and fallback hop lands on the same trace ID x-tfy-trace-id , exportable via OpenTelemetry to SIEM Data residency routing : Region-aware routing keeps regulated data within jurisdiction; provider restrictions block data classes from certain providers Compliance: - SOC 2 Type 2, HIPAA, ITAR certified - Recognized in Gartner Market Guide for AI Gateways 2026 truefoundry.com/ai-gateway Differentiator: - Most complete governance feature set: virtual keys + RBAC + policy-as-code + compliance-grade audit logs + residency routing - Deployment flexibility: SaaS, VPC, on-prem, air-gapped - GPU orchestration and fractional GPU support built in 2.18 RelayPlane — The npm-Native Cost-Intelligence Proxy | Attribute | Value | |---|---| | GitHub | github.com/RelayPlane/proxy | | Stars | ~200 | | Language | Node.js / TypeScript | | License | MIT | | Self-hosted | Yes npm install -g | | Providers | 11+ providers + Ollama | API Key / Access Control Features: Not a virtual-key system : Uses your own provider keys directly Anthropic, OpenAI, Google, xAI, Moonshot Cost intelligence proxy : Classifies tasks using heuristics token count, prompt patterns, keyword matching and routes to cheapest capable model Budget enforcement : In free tier; configurable cascade fallback when models hit limits Dashboard : Tracks every request, shows where money goes Key Differentiator: - npm-native: npm install -g @relayplane/proxy — 30 seconds, no Docker, no Python env, no Go toolchain - Local-first architecture: runs in-process with your app; zero network-hop overhead - MCP server support shipped in v1.0.0 - Designed for Claude Code / Cursor / coding agent workflows 2.19 OpenZiti LLM Gateway — Zero-Trust Proxy | Attribute | Value | |---|---| | GitHub | github.com/openziti/llm-gateway | | Stars | ~65 | | Language | Go 100% | | License | Apache 2.0 | | Self-hosted | Yes single binary, no DB | | Providers | OpenAI, Anthropic, Ollama, vLLM, llama-server, SGLang | API Key / Access Control Features: Virtual API Keys : Generated via llm-gateway genkey ; stored in config YAML Model-level restrictions : Keys can be restricted to specific models using glob patterns allowed models: " " or "gpt-4o" github.com/openziti/llm-gateway Client authentication : Authorization: Bearer <key header; /health and /metrics endpoints remain unauthenticated Unique Features: Zero-trust networking via zrok/OpenZiti overlay : Expose gateway or reach backends across NAT, air-gapped networks, or cloud boundaries without firewall rules Semantic routing : Three-layer cascade — keyword heuristics → embedding similarity → LLM classifier — to automatically select best model when model field is omitted Multi-endpoint load balancing : Weighted round-robin with health checks and passive failover across inference backends Architecture: - Single binary, zero infrastructure — one YAML config, no database, no message queue, no sidecar - Prometheus metrics endpoint for request counts, latency histograms, token counters 2.20 lazy-llm-proxy Xu-pixel — Per-Key Security Proxy | Attribute | Value | |---|---| | GitHub | github.com/Xu-pixel/lazy-llm-proxy | | Stars | ~100 | | Language | TypeScript | | License | MIT | | Self-hosted | Yes npm | API Key / Access Control Features: Per-key token budgets : Daily/monthly limits per API key IP allowlists : Per-key IP whitelisting — rare among LLM proxies Model allowlists : Per-key model restrictions System prompts : Optional forced or non-forced system prompts per key Usage breakdowns : Per-key and per-provider usage tracking Differentiator: - One of the few projects to offer per-key IP allowlisting natively - Cursor skill integration for agent-side deployment 2.21 Other Notable Projects Self-Hostable | Project | URL | Stars | Key Access Control Features | |---|---|---|---| OpenZiti LLM Gateway | github.com/openziti/llm-gateway | ~65 | Virtual API keys with model glob restrictions, zrok zero-trust overlay | TensorZero | github.com/tensorzero/tensorzero | ~11.4K | Custom API keys, tag-based granular rate limits | TrueFoundry AI Gateway | truefoundry.com/ai-gateway | N/A | Virtual Accounts, PATs, RBAC, policy-as-code Cedar/OPA , compliance-grade audit logs | RelayPlane | github.com/RelayPlane/proxy | ~200 | Provider-key-based no virtual keys ; cost-intelligence proxy; MCP support | lazy-llm-proxy | github.com/Xu-pixel/lazy-llm-proxy | ~100 | Per-key budgets, IP allowlists, model allowlists | InferXgate | inferxgate.com | New | Rust-based gateway with caching, analytics, cost optimization | Barbacane | Dev.to article | — | Rust-native; composable plugins for AI proxy | OpenClaw Gateway | openclaw.ai | — | Authentication, rate limiting, routing — open source | API7 API7.ai | api7.ai | API Gateway | LLM request proxying, ai-proxy plugin with key validation and rate limiting | llm-proxy-server | pypi.org/project/llm-proxy-server/ | — | Virtual API keys, user groups, OIDC integration | 2.22 Other Notable Projects Managed SaaS Only | Project | URL | Type | Key Access Control Features | |---|---|---|---| OpenRouter | openrouter.ai | Marketplace | Per-user API keys with credit limits, workspace guardrails 2026 , budget enforcement, zero data retention | Kilo Gateway | kilo.ai/gateway | SaaS | One API key for 500+ models, BYOK support, organization-level access controls | nexos.ai | nexos.ai | SaaS | Unified API to 200+ LLMs, governance/observability; founded by Nord Security creators $35M funding | Cloudflare AI Gateway | cloudflare.com | Edge CDN | Basic analytics, caching, rate limiting at edge network; no self-host option | n1n.ai | n1n.ai | SaaS | Single unified key for 500+ models | AIMLAPI | aimlapi.com | SaaS | One API for 500+ models | FreeLLMAPI | github.com/tashfeenahmed/freellmapi | SaaS | Unified API key, auto-selects best free provider | Datawiza Access Proxy | datawiza.com | SaaS | LLM API key management, identity-aware rate limiting, virtual tokens, SSO/mTLS | Requesty | requesty.ai | SaaS | API keys, routing policies, MCP gateway | Braintrust Gateway | braintrust.dev | SaaS beta | Organization-scoped sk- and project-scoped API keys; service tokens bt-st- ; AES-GCM per-user caching | LiteRouter | literouter.com | SaaS | Unlimited API creations, concurrent requests | 2.23 Projects Not Primarily LLM Proxies | Project | URL | Scope | Notes | |---|---|---|---| Vellum | vellum.ai | LLMOps/prompt management | “Deployments” are thin API proxies for versioned prompts; not an LLM proxy in the LiteLLM sense | Paperclip | github.com/paperclipai/paperclip | AI agent orchestration | Agent API keys, short-lived JWTs, per-agent budgets, org chart governance — delegates LLM routing to underlying tools | Unify AI | unify.ai | Unified inference layer | Single API key management and access control; dynamic intent-aware model routing | 3. Comparison Matrix 3.1 Core Feature Comparison | Project | Virtual Keys | Per-Key Budgets | Rate Limiting | RBAC | SSO/OIDC | MCP Support | Self-Hosted | Open Source | |---|---|---|---|---|---|---|---|---| LiteLLM | ✅ | ✅ | ✅ | Partial | Paid only | ❌ | ✅ | MIT | Bifrost | ✅ | ✅ | ✅ | ✅ | ✅ Google/GitHub | ✅ | ✅ | Apache 2.0 | Portkey | ✅ | ✅ | ✅ | ✅ | Enterprise | ✅ | ✅ | MIT | Helicone | ✅ | Limited | Limited | ❌ | Enterprise | ❌ | ✅ | Apache 2.0 | VoidLLM | ✅ | ✅ | ✅ | ✅ | Enterprise OIDC | ✅ | ✅ | BSL 1.1 | OmniRoute | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | MIT | LLM-API-Key-Proxy | Single key | ❌ | Partial | ❌ | ❌ | ❌ | ✅ | MIT+LGPL | Instawork llm-proxy | Per-key limits | ❌ | ✅ exp. | ❌ | ❌ | ❌ | ✅ | MIT | LLM Gateway | ✅ | Enterprise | Enterprise | Enterprise | Enterprise | ❌ | ✅ | AGPLv3 | llm-budget-proxy | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | MIT | LM-Proxy | ✅ | ❌ | ✅ | Groups | ✅ OIDC | ❌ | ✅ | MIT | LLM Security Gateway | Per-client key | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | — | WSO2 AI Gateway | ✅ | ✅ token-based | ✅ token-based | ✅ | Enterprise | ✅ | ✅ | Apache 2.0 | Kong AI Gateway | ✅ | Enterprise | Enterprise tier | ✅ | Enterprise | ✅ plugin | ✅ | Partial | TensorZero | ✅ | ✅ tags | ✅ granular | ❌ | ❌ | ❌ | ✅ | Apache 2.0 | TrueFoundry | ✅ VAT/PAT | ✅ YAML rules | ✅ per-user/team/model/metadata | ✅ | Enterprise | ✅ Cedar/OPA | ✅ SaaS/VPC/on-prem | Proprietary | RelayPlane | ❌ | ✅ free tier | ✅ | ❌ | ❌ | ✅ v1.0.0 | ✅ | MIT | OpenZiti LLM Gateway | ✅ glob models | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | Apache 2.0 | lazy-llm-proxy | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | MIT | LiteLLM’s RBAC and SSO are gated behind the enterprise license. 3.4 Managed vs Self-Hosted Classification | Category | Projects | |---|---| Self-hostable open-source | LiteLLM, Bifrost, Portkey gateway , Helicone, VoidLLM, OmniRoute, LLM-API-Key-Proxy, Instawork llm-proxy, LLM Gateway theopenco , llm-budget-proxy, LM-Proxy, LLM Security Gateway, TensorZero, RelayPlane, OpenZiti LLM Gateway, WSO2 AI Gateway, Kong | Managed SaaS only | OpenRouter, Kilo Gateway, nexos.ai, Cloudflare AI Gateway, Braintrust Gateway beta hosted , n1n.ai, AIMLAPI, FreeLLMAPI, LiteRouter, Requesty, Datawiza, Unify AI | Hybrid SaaS + self-host on enterprise | Portkey gateway MIT + managed platform , LLM Gateway AGPLv3 + enterprise , Braintrust self-host on enterprise plan , TrueFoundry SaaS + VPC/on-prem | 3.2 Performance Comparison | Project | Language | Overhead | Throughput | Notes | |---|---|---|---|---| Bifrost | Go | ~8us P50, ~11us P95 | 5,000+ RPS | Fastest measured; compiled binary TECHSY | TensorZero | Rust | <1ms P99 | 10,000+ QPS | LiteLLM @ 100 QPS adds 25-100x more latency tensorzero.com/docs/gateway | Instawork llm-proxy | Go | Not published | — | Minimalist design | VoidLLM | Go | Sub-2ms | — | Zero-knowledge architecture | WSO2 AI Gateway | Go Envoy | Envoy-native | — | Established proxy infrastructure | Helicone Rust | Rust | ~5ms P50, ~8ms P95 | ~3,000 RPS | Cloudflare Workers for logging TECHSY | LiteLLM | Python | ~4ms P50, ~8ms P95 | ~1,000 RPS | GIL-bound under high concurrency TECHSY | Portkey | TypeScript | ~5ms P50, ~12ms P95 | ~2,000 RPS | Claimed <1ms; independent benchmarks not found TECHSY | Kong AI Gateway | Lua/Go | ~3ms P50, ~8ms P95 | ~3,000 RPS | Enterprise API management backbone TECHSY | RelayPlane | Node.js | ~0ms local | — | In-process with app; no network hop relayplane.com | Source notes: - Bifrost benchmarks from DEV Community article by Pranay Batta Jan 2026 , with open-source benchmarking suite at github.com/maximhq/bifrost-benchmarking. On t3.medium @ 500 RPS: Bifrost p99=1.68s vs LiteLLM p99=90.72s 54x faster ; throughput 424/s vs 44.84/s 9.4x higher . On t3.xlarge @ 5,000 RPS: Bifrost mean overhead=11µs vs LiteLLM ~500µs 45x higher . DEV Community, Jan 16 2026 - TECHSY comparison table from “Stop Juggling LLM APIs: 8 Gateway Tools Ranked for 2026” Jun 6, 2026 TECHSY - TensorZero benchmarks from official docs tensorzero.com/docs/gateway Important caveat : All benchmarks above are self-reported by project authors. Independent third-party benchmarks across a common test harness do not yet exist for this market. 3.3 Provider Coverage | Project | Provider Count | Notable Providers | |---|---|---| Portkey | 1,600+ | Largest catalog | LiteLLM | 100+ | Bedrock, Azure, Vertex, Cohere, Sagemaker, HuggingFace, NVIDIA NIM | Helicone | 100+ | OpenAI, Anthropic, Ollama, AWS Bedrock, Gemini | OmniRoute | 160+ | 50+ free providers | LLM Gateway | 210+ | OpenAI, Anthropic, Google Vertex AI | Bifrost | 23+ | OpenAI, Anthropic, AWS Bedrock, Google Vertex, Azure, Cerebras | LM-Proxy | 4+ | OpenAI, Anthropic, Google AI, local PyTorch | VoidLLM | 5+ | OpenAI, Anthropic, Azure, Ollama, vLLM | Instawork llm-proxy | 4 | OpenAI, Anthropic, Gemini, AWS Bedrock | TensorZero | 15+ direct, any OpenAI-compatible | OpenAI, Anthropic, AWS Bedrock, Azure, Groq, Mistral, vLLM, xAI | OpenZiti LLM Gateway | 3+ | OpenAI, Anthropic, Ollama, vLLM, llama-server, SGLang | RelayPlane | 11+ | Anthropic, OpenAI, Google, xAI, Moonshot, Ollama | nexos.ai | 200+ | Via managed SaaS platform | Kilo Gateway | 500+ | Anthropic, OpenAI, Mistral, and more | 4. API Key Architecture Patterns 4.1 Pattern A: Simple Proxy Key Single Shared Key Example : LLM-API-Key-Proxy A single PROXY API KEY is shared by all clients. The proxy validates this key and forwards requests to upstream providers with their own keys rotated automatically . Pros : Simplest setup; one credential to manage Cons : No per-client attribution; no granular limits; if compromised, entire deployment exposed 4.2 Pattern B: Virtual Keys Indirection Layer Examples : LiteLLM, Bifrost, Portkey, VoidLLM, llm-budget-proxy The proxy accepts client-specific keys e.g., sk-virtual-... and maps them internally to upstream provider credentials. Each virtual key can have its own budget, rate limits, and access scope. Pros : Full per-client attribution; granular budgets; isolation between services Cons : More configuration; requires key management workflow 4.3 Pattern C: Per-Client JSON/DynamoDB Config Example : LLM Security Gateway, LM-Proxy Each client is configured with a set of API keys and settings stored in JSON or DynamoDB. The proxy validates against this store. Pros : Flexible per-client configuration; can include model allowlists, provider selection Cons : Requires external store; no built-in dashboard 4.4 Pattern D: OIDC / Identity Provider Integration Examples : LM-Proxy, Bifrost, VoidLLM Enterprise , Portkey Enterprise Client API keys are validated against an OIDC provider Keycloak, Auth0, Okta . The token becomes the virtual API key. Pros : Enterprise identity integration; SSO experience; existing IAM infrastructure Cons : Requires OIDC infrastructure; adds latency for validation 4.5 Pattern E: Hierarchical Organization Examples : Bifrost, VoidLLM, Paperclip, LiteLLM paid Keys are scoped within an organizational hierarchy: Customer → Team → User → Key. Each level can have its own budget and limits. Pros : Enterprise-grade governance; cost allocation by org unit Cons : Complex configuration; often gated behind enterprise license 5. Granular Access Control — Deep Feature Analysis This section goes beyond cataloging which projects support “virtual keys” to analyze how granular the access control actually is. The user’s core question — creating API keys to split up additional access control to more granular levels — requires examining per-model scoping, IP allowlisting, key expiration/rotation, webhook-based validation, and multi-tenancy isolation. 5.1 Per-Model / Per-Endpoint Scoping Can you create a key that only accesses GPT-4 but not Claude? Only /chat/completions but not /embeddings ? | Project | Per-Model Scoping | Per-Endpoint Scoping | Mechanism | |---|---|---|---| LiteLLM | ✅ | ❌ | model list config; per-key TPM/RPM limits docs.litellm.ai/docs/proxy/users | Bifrost | ✅ | ❌ | Per-key model restrictions, spend caps at proxy layer; hierarchical Customer → Team → Virtual Key → Provider Config | TensorZero | ✅ via tags | ❌ | Custom API keys with tag-based scoping; rate limits by tag scopes tensorzero.com/docs/operations/set-up-auth-for-tensorzero | OpenZiti LLM Gateway | ✅ glob | ❌ | Keys restricted to models via glob patterns in config allowed models: "gpt-4o" github.com/openziti/llm-gateway | LLM Security Gateway | ✅ | ❌ | Per-client model allowlists stored in JSON or DynamoDB | TrueFoundry | ✅ | ❌ | RBAC scoped per provider account and model; policy-as-code with Cedar/OPA truefoundry.com/blog/ai-governance-audit-enterprise-llm-gateway | Portkey | Partial | ❌ | Workspace-level access control; inbound rules for IPs/geographies but not fine-grained per-model at key level | lazy-llm-proxy | ✅ | ❌ | Per-key model allowlists | Assessment : LiteLLM, Bifrost, and TrueFoundry offer the most mature per-model scoping. OpenZiti uses glob patterns which are flexible but less precise than tag-based systems. None of the surveyed projects support true per-endpoint scoping e.g., a key that can only call /chat/completions but not /embeddings . This is a gap in the market. 5.2 IP Allowlisting and CORS Restrictions | Project | IP Allowlisting | Geographic Restrictions | CORS Control | |---|---|---|---| Portkey | ✅ | ✅ geography | ❌ | lazy-llm-proxy | ✅ | ❌ | ❌ | Most others | ❌ | ❌ | ❌ | Assessment : IP allowlisting is notably absent from most LLM proxies. This is a significant gap for teams that need to restrict which networks can reach their proxy. Portkey is the only major project with built-in IP/geo controls. Others require deploying behind a network firewall or VPC. 5.3 Key Expiration and Rotation Policies | Project | Auto-Expiry | One-Time-Use Keys | Auto-Rotation of Provider Keys | Webhook-Based Validation | |---|---|---|---|---| LiteLLM | ❌ | ❌ | ❌ | ❌ | Bifrost | ❌ | ❌ | ❌ | ❌ | TrueFoundry | ❌ revocation | ❌ | ✅ VAT/PAT revocation without downtime | ❌ | TensorZero | ❌ | ❌ | ❌ | ❌ | LLM-API-Key-Proxy | N/A | ❌ | ✅ automatic multi-provider key rotation with cooldowns | ❌ | LM-Proxy Nayjest | ❌ | ❌ | ❌ | ✅ api key check can reference external HTTP service pypi.org/project/lm-proxy/ | Assessment : No surveyed project supports auto-expiry or one-time-use keys. Key rotation of upstream provider credentials is only handled by LLM-API-Key-Proxy automatic multi-provider rotation . Webhook-based key validation — useful for integrating with custom IAM systems — is only available in LM-Proxy Nayjest via configurable api key check functions or external HTTP services. 5.4 Multi-Tenancy Isolation Strength | Project | Isolation Model | Org/Team Hierarchy | Physical Isolation Option | |---|---|---|---| Bifrost | Logical tenant id in DB | ✅ Full CRUD for org/team/business unit | ❌ | TrueFoundry | Logical VAT/PAT | ✅ Virtual Accounts, per-provider RBAC | ✅ VPC/on-prem/air-gapped deployment | LiteLLM | Logical team id in PostgreSQL | ✅ Teams with shared budgets | ❌ | Portkey | Logical workspace | ✅ Workspaces | ❌ | Braintrust | Logical org/project | ✅ Organization-scoped and project-scoped keys | ❌ enterprise plan for self-host | TensorZero | Logical tags/namespaces | ❌ no org hierarchy | ❌ | OpenZiti LLM Gateway | Flat config | ❌ | ❌ | Assessment : TrueFoundry is the only project offering both logical multi-tenancy AND physical isolation options VPC, on-prem, air-gapped . Bifrost offers the most granular org/team hierarchy with full CRUD. LiteLLM’s team-based model is widely used but provides only logical separation within a shared PostgreSQL database. 7. Security Considerations 5.1 The Supply-Chain Risk LiteLLM’s March 2026 breach demonstrated that even popular open-source projects are vulnerable to supply-chain attacks through CI/CD pipeline compromise poisoned GitHub Action → stolen PyPI credentials . Projects with smaller dependency trees and compiled binaries Go, Rust present a lower attack surface: Bifrost Go, Apache 2.0 : Single binary, minimal dependencies VoidLLM Go, BSL 1.1 : Compiled binary, OpenSSF Scorecard WSO2 AI Gateway Go, Envoy : Enterprise-grade supply chain practices TensorZero Rust, Apache 2.0 : Compiled binary; team includes Rust compiler maintainer OpenZiti LLM Gateway Go, Apache 2.0 : Single binary, no DB, minimal attack surface RelayPlane Node.js, MIT : npm-native, local-first — no server-side attack surface for remote adversaries 5.2 Prompt/Data Privacy Most proxies log request/response metadata for observability. However: VoidLLM is architecturally zero-knowledge — never stores or logs prompt/response content LiteLLM , Portkey , and Helicone store full request/response data by default configurable LLM Security Gateway includes PII scanning with configurable redact/block/log actions Braintrust Gateway : Uses AES-GCM encryption tied to each user’s API key; cached results scoped to individual user; Braintrust cannot see your data and does not store or log API keys nexos.ai : Founded by Nord Security creators, positions data governance as core value proposition 5.3 Content Security Only the LLM Security Gateway provides built-in injection detection 20 patterns across 4 categories and response scanning. Other projects leave this to external guardrails Portkey has 50+ guardrails, LiteLLM has guardrails in enterprise tier . TrueFoundry adds policy-as-code guardrails Cedar/OPA at the MCP-tool boundary. 5.2 Prompt/Data Privacy Most proxies log request/response metadata for observability. However: VoidLLM is architecturally zero-knowledge — never stores or logs prompt/response content LiteLLM , Portkey , and Helicone store full request/response data by default configurable LLM Security Gateway includes PII scanning with configurable redact/block/log actions 5.3 Content Security Only the LLM Security Gateway provides built-in injection detection 20 patterns across 4 categories and response scanning. Other projects leave this to external guardrails Portkey has 50+ guardrails, LiteLLM has guardrails in enterprise tier . 8. Emerging Trends 6.1 MCP Gateway as a New Governance Frontier As AI agents proliferate, they need governed access to external tools and APIs. Bifrost, Portkey, VoidLLM, WSO2, Kong, RelayPlane, and TrueFoundry all now offer MCP gateway capabilities: Bifrost : MCP client + server with OAuth 2.0 auth, tool filtering per virtual key Portkey : Centralized control plane for MCP servers with access control and observability VoidLLM : External MCP server proxying with scoped access control; Code Mode WASM-sandboxed JS WSO2 : Converts REST APIs to MCP-compatible servers Kong : AI MCP Proxy Plugin with OAuth2 authentication TrueFoundry : MCP Gateway with OAuth 2.0 secured access; policy-as-code guardrails Cedar/OPA at MCP-tool boundary RelayPlane : MCP server support shipped in v1.0.0 6.2 Token-Aware Rate Limiting Traditional API gateways rate-limit on request count. LLM costs scale with tokens, not requests. Projects like WSO2 and Instawork llm-proxy implement token-based rate limiting calibrated to how LLMs actually charge. 6.3 Semantic Caching Bifrost, Portkey, and Helicone offer semantic caching embedding-based similarity search in addition to exact-match caching, reducing costs for repeated or near-identical queries. 6.4 Model Downgrade on Budget Pressure llm-budget-proxy introduces automatic model downgrade e.g., GPT-4 → GPT-4o-mini when approaching budget thresholds — a cost optimization pattern that could spread to other gateways. 9. Operational Complexity and Deployment Comparison The user’s use case granular access control via API keys implies a production deployment scenario. The following tables compare operational complexity across the most relevant projects. 7.1 Infrastructure Dependencies | Project | Database Required | Cache Required | Message Queue | Cloud Provider Lock-in | |---|---|---|---|---| LiteLLM | PostgreSQL required for team features | Redis recommended | ❌ | ❌ | Bifrost | PostgreSQL for governance features | ❌ | ❌ | ❌ | TensorZero | Postgres optional , ClickHouse optional , Valkey/Redis optional | Optional Valkey/Redis | ❌ | ❌ | TrueFoundry | Managed by platform | Managed | ❌ | Partial SaaS on AWS | Helicone | Cloudflare Workers managed or self-hosted DB | Cloudflare cache | ❌ | Yes Cloudflare for managed | OpenZiti LLM Gateway | None — single binary | ❌ | ❌ | ❌ | RelayPlane | None — npm package | ❌ | ❌ | ❌ | llm-budget-proxy | SQLite only | ❌ | ❌ | ❌ | Portkey self-hosted | Depends on managed features used | Redis recommended | ❌ | ❌ | WSO2 AI Gateway | PostgreSQL/MySQL | Redis | Kafka optional for async logging | ❌ | 7.2 Kubernetes Readiness | Project | Official K8s Support | Helm Chart | Operator | Notes | |---|---|---|---|---| LiteLLM | ✅ | ✅ official | ❌ | Well-documented; large community | Bifrost | ✅ | ✅ | ❌ | Docker + Helm charts available | TensorZero | ✅ | ✅ examples/ | ❌ | K8s/Helm/Argo examples on GitHub | TrueFoundry | ✅ | N/A platform-managed | ✅ | Kubernetes-native; GPU orchestration built-in | Helicone | ✅ | ✅ | ❌ | Self-hosting noted as “not recommended” for manual deployment | WSO2 AI Gateway | ✅ | ✅ | ✅ | Built on Envoy Proxy — established K8s patterns | Kong AI Gateway | ✅ | ✅ | ✅ | Mature Kubernetes operators | Portkey self-hosted | ✅ | ❌ | ❌ | Docker + Cloudflare Workers deployment | OpenZiti LLM Gateway | ❌ | ❌ | ❌ | Not officially documented for K8s | RelayPlane | ❌ | ❌ | ❌ | Local npm package, not a service | llm-budget-proxy | ❌ | ❌ | ❌ | Single Docker container | 7.3 Onboarding Time Estimates | Project | Setup Time | Complexity Level | Notes | |---|---|---|---| RelayPlane | ~30 seconds | Trivial | npm install -g @relayplane/proxy — local proxy, no infra | OpenZiti LLM Gateway | 2–3 minutes | Low | One YAML config file; single binary | llm-budget-proxy | ~5 minutes | Low | Single Docker container with SQLite | TensorZero | ~5 minutes quickstart | Medium | GitOps-friendly config; optional DBs | Bifrost | <1 minute Docker | Low | Web UI configuration; single command deploy | LiteLLM | 20–30 minutes | Medium-High | Requires PostgreSQL + Redis; YAML config complexity | Portkey self-hosted | 10–15 minutes | Medium | Docker/Node.js; moderate learning curve | TrueFoundry SaaS | Account creation | Low | Enterprise: sales discussion for VPC/on-prem deployment | WSO2 AI Gateway | 30–60 minutes | High | Envoy-based; full platform integration | 10. Recommendations by Use Case For Enterprise Teams Needing Full Governance TrueFoundry or Bifrost TrueFoundry : Most complete governance feature set — virtual accounts, RBAC per provider account, policy-as-code Cedar/OPA , compliance-grade audit logs with SIEM export, data residency routing. SOC 2 Type 2, HIPAA, ITAR certified. VPC/on-prem/air-gapped deployment options. Recognized in Gartner Market Guide for AI Gateways 2026. Bifrost : Virtual keys with hierarchical budgets Customer → Team → Virtual Key → Provider Config , RBAC, Google/GitHub SSO open-source, not paywalled , HashiCorp Vault integration, OIDC provisioning. 11µs overhead at 5,000 RPS. MCP gateway with tool filtering per virtual key. WSO2 AI Gateway : Alternative for teams already on WSO2 platform; Envoy-based with SOC 2 Type 2 and ISO 27001 compliance; token-based rate limiting calibrated to LLM pricing. For Teams Already on Kong Kong AI Gateway - Natural adoption path; extend existing API management to LLM traffic - Token-based rate limiting enterprise tier , RBAC, audit logs, MCP proxy plugin For Quick Setup / Single Provider llm-budget-proxy or RelayPlane - llm-budget-proxy: 5-minute Docker deploy, SQLite, perfect for dev/staging with per-key token budgets and Slack/Discord alert webhooks - RelayPlane: 30-second npm install, local-first zero network hop , MCP support — ideal for coding agent workflows For Privacy-Conscious Deployments VoidLLM or OpenZiti LLM Gateway - VoidLLM: Zero-knowledge architecture by design; never stores or logs prompt/response content; sub-2ms proxy overhead; RBAC with org/team/user scoping - OpenZiti LLM Gateway: Zero-trust networking via zrok/OpenZiti overlay; single binary, no DB; model-level key restrictions via glob patterns For Observability-First Teams Helicone or Portkey - Helicone: YC-backed, Rust-based observability-first with light gateway features; ~3,000 RPS per instance - Portkey: Strong guardrails 50+ pre-built checks , full LLMOps platform; now fully open-source March 2026 For Maximum Provider Coverage Portkey 1,600+ models or LiteLLM 100+ providers For ML-Optimized Routing TensorZero - <1ms P99 latency at 10,000+ QPS Rust ; tag-based granular rate limits; Autopilot automated optimization; GitOps-friendly orchestration For Minimalist / Single-Concern Proxies Security : LLM Security Gateway injection detection, PII scanning Budget enforcement : llm-budget-proxy per-key daily/monthly USD budgets Key rotation : LLM-API-Key-Proxy automatic multi-provider key rotation Circuit breaking : Instawork llm-proxy per-key circuit breaker with provider rollup Zero-trust networking : OpenZiti LLM Gateway zrok overlay, no firewall rules Local-first proxy : RelayPlane npm-native, in-process with your app For Webhook-Based Key Validation LM-Proxy Nayjest - Extensible api key check can reference a Python function or external HTTP service for custom authentication logic — unique among surveyed projects For Per-Key IP Allowlisting Portkey or lazy-llm-proxy - Portkey: Inbound rules for IPs and geographies on deployments - lazy-llm-proxy: Per-key IP whitelisting rare feature 11. Decision Framework by Requirement The following matrix maps specific requirements to top candidates with pros and cons for each scenario. | Requirement | Top Pick | Runner-Up | Key Trade-off | |---|---|---|---| Per-model scoping | LiteLLM / Bifrost | TrueFoundry RBAC per provider account | LiteLLM: widest provider coverage; Bifrost: lower latency | Per-endpoint scoping | None available | — | No project supports restricting keys to specific API endpoints e.g., /chat/completions only | IP allowlisting | Portkey | lazy-llm-proxy | Portkey requires managed platform; lazy-llm-proxy is small/new | Auto key expiry | None available | — | Gap in the market; all projects use manual management | One-time-use keys | None available | — | Gap in the market | Webhook-based validation | LM-Proxy Nayjest | — | Only project with configurable external HTTP validation | Physical isolation air-gap | TrueFoundry | WSO2 AI Gateway | TrueFoundry: VPC/on-prem/air-gapped; WSO2: Envoy-based K8s native | Lowest overhead | TensorZero <1ms P99 | Bifrost 11µs @ 5k RPS | TensorZero: steeper learning curve; Bifrost: smaller provider coverage 23+ | Fastest setup | RelayPlane 30s | OpenZiti 2-3 min | RelayPlane: local-only, not a service; OpenZiti: no K8s support | Maximum provider coverage | Portkey 1,600+ | LiteLLM 100+ providers | Portkey: managed platform pricing; LiteLLM: Python GIL limits | MCP governance | TrueFoundry / Bifrost | Portkey / VoidLLM | TrueFoundry: Cedar/OPA policy-as-code; Bifrost: open-source, not paywalled | Multi-tenant org hierarchy | Bifrost full CRUD | TrueFoundry VAT/PAT | Bifrost: logical only; TrueFoundry: physical isolation options | For Teams Already on Kong Kong AI Gateway - Natural adoption path; extend existing API management to LLM traffic For Quick Setup / Single Provider llm-budget-proxy or LiteLLM - llm-budget-proxy: 5-minute Docker deploy, SQLite, perfect for dev/staging - LiteLLM: 100+ providers, most mature feature set For Privacy-Conscious Deployments VoidLLM - Zero-knowledge architecture by design - Sub-2ms proxy overhead - RBAC with org/team/user scoping For Observability-First Teams Helicone or Portkey - Helicone: YC-backed, observability-first with light gateway features - Portkey: Strong guardrails, 50+ pre-built checks, full LLMOps platform For Maximum Provider Coverage Portkey 1,600+ models or LiteLLM 100+ providers For Minimalist / Single-Concern Proxies Security : LLM Security Gateway Budget enforcement : llm-budget-proxy Key rotation : LLM-API-Key-Proxy Circuit breaking : Instawork llm-proxy 13. Methodology Note This research was conducted on June 9, 2026 through systematic web searches using the mcp search web search tool with approximately 40+ distinct search queries across these categories: Technical terms : “LLM proxy”, “AI gateway”, “LLM router”, “LLM middleware” Lay phrasings : “proxy LLM APIs”, “unified LLM API” Named-entity queries : Project names discovered through comparison articles Bifrost, TensorZero, TrueFoundry, RelayPlane, OpenZiti, etc. Feature-specific queries : “virtual keys budgets RBAC”, “per-model per-endpoint API key scoping IP allowlist expiration rotation”, “LLM proxy webhook key validation” Benchmark queries : “Bifrost LiteLLM benchmark independent verification”, “TensorZero benchmarks latency” Missing competitor discovery : “Vellum AI platform LLM routing proxy API keys open source”, “LangFuse self-hosted API key scoping multi-tenant access control”, “Unify AI unified inference layer API key management access control”, “TrueFoundry AI gateway API key management virtual keys RBAC”, “Braintrust gateway API key management multi-tenant access control”, “Kilo Gateway universal LLM inference API key management”, “nexos.ai unified AI API gateway access control” Primary sources fetched and read in full via mcp search web fetch : - TECHSY, “Stop Juggling LLM APIs: 8 Gateways Ranked for 2026” Jun 6, 2026 - getmaxim.ai, “Best LLM Gateways in 2026” Apr 14, 2026 - Braintrust.dev, “6 best LLM gateways for developers in 2026” May 16, 2026 - TrueFoundry, “Top 6 LLM Gateways in 2026” Sep 23, 2025 - RelayPlane, “LLM Gateway Comparison” Mar 2026 - DEV Community, “How We Benchmarked Bifrost against LiteLLM” by Pranay Batta Jan 16, 2026 - TensorZero docs, “Gateway Overview” - TrueFoundry, “AI Governance and Audit for Enterprise LLMs” Jun 8, 2026 - Braintrust, “Use the Braintrust gateway” docs - Kilo Gateway landing page kilo.ai/gateway - OpenZiti LLM Gateway README github.com/openziti/llm-gateway - Vellum platform overview deepchecks.com Benchmark verification : All performance claims are attributed to their source. Bifrost’s “54x faster” claim comes from self-reported benchmarks with an open-source benchmarking suite github.com/maximhq/bifrost-benchmarking . TensorZero’s “<1ms P99” claims come from official docs. TECHSY’s performance overhead table provides a cross-project comparison. Independent third-party benchmarks across a common test harness do not yet exist for this market — all numbers should be treated as self-reported until independently verified. Limitations : The research focused on open-source and self-hostable projects primarily, with managed-only platforms tracked separately. Some smaller or newer projects may have been missed. Provider counts and star counts are approximate. 14. References - Sonatype, “Compromised LiteLLM PyPI Package Delivers Multi-Stage Credential Stealer,” March 2026. https://www.sonatype.com/blog/compromised-litellm-pypi-package-delivers-multi-stage-credential-stealer https://www.sonatype.com/blog/compromised-litellm-pypi-package-delivers-multi-stage-credential-stealer - TechCrunch, “Delve Accused of Misleading Customers with Fake Compliance,” March 2026. https://techcrunch.com/2026/03/22/delve-accused-of-misleading-customers-with-fake-compliance/ https://techcrunch.com/2026/03/22/delve-accused-of-misleading-customers-with-fake-compliance/ - LiteLLM Documentation, “Budgets, Rate Limits.” https://docs.litellm.ai/docs/proxy/users https://docs.litellm.ai/docs/proxy/users - LiteLLM Documentation, “Budgets, Rate Limits.” also referenced in 3 — budget management features for virtual keys. - Bifrost Documentation, “Governance — Virtual Keys, Budgets & Enterprise RBAC.” https://www.getmaxim.ai/bifrost/resources/governance https://www.getmaxim.ai/bifrost/resources/governance - Bifrost Documentation, “Enterprise — User Provisioning OIDC .” https://docs.getbifrost.ai/enterprise/user-provisioning https://docs.getbifrost.ai/enterprise/user-provisioning - Portkey Documentation, “Role-based access control.” https://portkey.ai/docs/product/ai-gateway/rbac https://portkey.ai/docs/product/ai-gateway/rbac - Portkey Blog, “Manage LLM API keys with secret references,” April 2026. https://portkey.ai/blog/secret-references-ai-api-key-management/ https://portkey.ai/blog/secret-references-ai-api-key-management/ - VoidLLM README. https://github.com/voidmind-io/voidllm https://github.com/voidmind-io/voidllm - OmniRoute GitHub, v1.4.0 release notes — “Dedicated API Key Manager.” https://newreleases.io/project/github/diegosouzapw/OmniRoute/release/v1.4.0 https://newreleases.io/project/github/diegosouzapw/OmniRoute/release/v1.4.0 - LiteLLM GitHub. https://github.com/BerriAI/litellm https://github.com/BerriAI/litellm - Bifrost GitHub. https://github.com/maximhq/bifrost https://github.com/maximhq/bifrost - Portkey AI Gateway GitHub. https://github.com/Portkey-AI/gateway https://github.com/Portkey-AI/gateway - Helicone GitHub. https://github.com/Helicone/helicone https://github.com/Helicone/helicone - VoidLLM GitHub. https://github.com/voidmind-io/voidllm https://github.com/voidmind-io/voidllm - OmniRoute GitHub. https://github.com/diegosouzapw/OmniRoute https://github.com/diegosouzapw/OmniRoute - LLM-API-Key-Proxy GitHub. https://github.com/Mirrowel/LLM-API-Key-Proxy https://github.com/Mirrowel/LLM-API-Key-Proxy - Instawork llm-proxy GitHub. https://github.com/Instawork/llm-proxy https://github.com/Instawork/llm-proxy - LLM Gateway theopenco GitHub. https://github.com/theopenco/llmgateway https://github.com/theopenco/llmgateway - llm-budget-proxy GitHub. https://github.com/InkByteStudio/llm-budget-proxy https://github.com/InkByteStudio/llm-budget-proxy - LM-Proxy Nayjest GitHub. https://github.com/Nayjest/lm-proxy https://github.com/Nayjest/lm-proxy - LLM Security Gateway GitHub. https://github.com/TerminalsandCoffee/llm-security-gateway https://github.com/TerminalsandCoffee/llm-security-gateway - Paperclip GitHub. https://github.com/paperclipai/paperclip https://github.com/paperclipai/paperclip - WSO2 AI Gateway GitHub. https://github.com/wso2/wso2-envoy-ai-gateway https://github.com/wso2/wso2-envoy-ai-gateway - WSO2 Blog, “Best LiteLLM Alternatives in 2026: Secure AI Gateways,” April 2026. https://wso2.com/library/blogs/litellm-alternatives/ https://wso2.com/library/blogs/litellm-alternatives/ - getmaxim.ai, “Top LiteLLM Alternatives in 2026,” April 2026. https://www.getmaxim.ai/articles/top-litellm-alternatives-in-2026/ https://www.getmaxim.ai/articles/top-litellm-alternatives-in-2026/ - DEV Community, “Top LiteLLM Alternatives in 2026” by Kuldeep Paul, March 2026. https://dev.to/kuldeep paul/top-litellm-alternatives-in-2026-1gi1 https://dev.to/kuldeep paul/top-litellm-alternatives-in-2026-1gi1 - DEV Community, “This Open-Source LLM Gateway is 54x Faster Than LiteLLM” by Deb McKinney, January 2026. https://dev.to/debmckinney/this-open-source-llm-gateway-is-54x-faster-than-litellm-heres-why-1h https://dev.to/debmckinney/this-open-source-llm-gateway-is-54x-faster-than-litellm-heres-why-1h - Pomerium Blog, “LiteLLM Alternatives: Best Open-Source and Secure LLM Gateways in 2025.” https://www.pomerium.com/blog/litellm-alternatives https://www.pomerium.com/blog/litellm-alternatives - TECHSY, “Stop Juggling LLM APIs: 8 Gateways Ranked 2026,” June 2026. https://techsy.io/en/blog/best-llm-gateway-tools https://techsy.io/en/blog/best-llm-gateway-tools - TrueFoundry, “Top 5 LiteLLM Alternatives for Enterprises in 2026,” January 2026. https://www.truefoundry.com/blog/litellm-alternatives https://www.truefoundry.com/blog/litellm-alternatives - OpenRouter Documentation, “Organization Management.” https://openrouter.ai/docs/cookbook/administration/organization-management https://openrouter.ai/docs/cookbook/administration/organization-management - OpenRouter Release Notes, June 2026 — “workspace guardrails for budget limits, zero data retention.” https://releasebot.io/updates/openrouter https://releasebot.io/updates/openrouter - Datawiza Blog, “LLM API Key Management and Identity-Aware Rate Limiting,” May 2026. https://www.datawiza.com/blog/industry/llm-api-key-management-and-identity-aware-rate-limiting/ https://www.datawiza.com/blog/industry/llm-api-key-management-and-identity-aware-rate-limiting/ - Requesty. https://www.requesty.ai https://www.requesty.ai - LLM Security Gateway Medium article by Terminals & Coffee, February 2026. https://medium.com/@terminalsandcoffee/i-built-a-security-proxy-for-llm-apis-8c44f7c26730 https://medium.com/@terminalsandcoffee/i-built-a-security-proxy-for-llm-apis-8c44f7c26730 - InkByteStudio, “LLM API Rate Limiting & Cost Control: Token Budgets & Throttling,” March 2026. https://igotasite4that.com/blog/llm-api-rate-limiting-cost-control/ https://igotasite4that.com/blog/llm-api-rate-limiting-cost-control/ - Paperclip.ing. https://paperclip.ing https://paperclip.ing - API7 API7.ai Learning Center. https://api7.ai/learning-center/api-gateway-guide/api-gateway-proxy-llm-requests https://api7.ai/learning-center/api-gateway-guide/api-gateway-proxy-llm-requests - OpenClaw Gateway Authentication. https://docs.openclaw.ai/gateway/authentication https://docs.openclaw.ai/gateway/authentication - LiteRouter. https://literouter.com https://literouter.com - n1n.ai. https://n1n.ai https://n1n.ai - AIMLAPI. https://aimlapi.com https://aimlapi.com - FreeLLMAPI GitHub. https://github.com/tashfeenahmed/freellmapi https://github.com/tashfeenahmed/freellmapi - TensorZero, “Gateway Overview.” https://www.tensorzero.com/docs/gateway https://www.tensorzero.com/docs/gateway - TensorZero, “Set up auth for TensorZero” operations docs . https://www.tensorzero.com/docs/operations/set-up-auth-for-tensorzero https://www.tensorzero.com/docs/operations/set-up-auth-for-tensorzero - TensorZero GitHub. https://github.com/tensorzero/tensorzero https://github.com/tensorzero/tensorzero - TrueFoundry, “AI Governance and Audit for Enterprise LLMs: Virtual Keys, RBAC, and Compliance-Grade Logs,” June 8, 2026. https://www.truefoundry.com/blog/ai-governance-audit-enterprise-llm-gateway https://www.truefoundry.com/blog/ai-governance-audit-enterprise-llm-gateway - TrueFoundry AI Gateway overview. https://www.truefoundry.com/ai-gateway https://www.truefoundry.com/ai-gateway - RelayPlane, “LLM Gateway Comparison.” https://relayplane.com/compare/llm-gateways https://relayplane.com/compare/llm-gateways - RelayPlane GitHub. https://github.com/RelayPlane/proxy https://github.com/RelayPlane/proxy - OpenZiti LLM Gateway README. https://github.com/openziti/llm-gateway https://github.com/openziti/llm-gateway - Kilo Gateway landing page. https://kilo.ai/gateway https://kilo.ai/gateway - nexos.ai API SDK features. https://nexos.ai/features/api-sdk/ https://nexos.ai/features/api-sdk/ - Braintrust, “Use the Braintrust gateway” docs. https://www.braintrust.dev/docs/deploy/gateway https://www.braintrust.dev/docs/deploy/gateway - Braintrust, “Organizations and Authentication.” https://deepwiki.com/braintrustdata/braintrust-go/4.5-organizations-and-authentication https://deepwiki.com/braintrustdata/braintrust-go/4.5-organizations-and-authentication - DEV Community, “How We Benchmarked Bifrost against LiteLLM” by Pranay Batta, January 16, 2026. https://dev.to/pranay batta/how-we-benchmarked-bifrost-against-litellmand-what-we-learned-about-performance-c1o https://dev.to/pranay batta/how-we-benchmarked-bifrost-against-litellmand-what-we-learned-about-performance-c1o - getmaxim.ai, “Best LLM Gateways in 2026,” April 14, 2026. https://www.getmaxim.ai/articles/best-llm-gateways-in-2026/ https://www.getmaxim.ai/articles/best-llm-gateways-in-2026/ - Braintrust.dev, “6 best LLM gateways for developers in 2026,” May 16, 2026. https://www.braintrust.dev/articles/best-llm-gateways-2026 https://www.braintrust.dev/articles/best-llm-gateways-2026 - TrueFoundry, “Top 6 LLM Gateways in 2026.” https://www.truefoundry.com/blog/best-llm-gateways https://www.truefoundry.com/blog/best-llm-gateways - deepchecks.com, “What is Vellum AI? Features & Getting Started,” January 2025. https://deepchecks.com/llm-tools/vellum-ai/ https://deepchecks.com/llm-tools/vellum-ai/ - lazy-llm-proxy GitHub. https://github.com/Xu-pixel/lazy-llm-proxy https://github.com/Xu-pixel/lazy-llm-proxy - Forbes, “Nord Security Founders Launch Nexos.ai For Governed Enterprise AI,” November 2025. https://www.forbes.com/sites/ronschmelzer/2025/11/25/nord-security-founders-launch-nexosai-for-governed-enterprise-ai/ https://www.forbes.com/sites/ronschmelzer/2025/11/25/nord-security-founders-launch-nexosai-for-governed-enterprise-ai/ - Vellum Documentation. https://docs.vellum.ai/ https://docs.vellum.ai/ - Braintrust.dev, “How we chose the best LLM gateways.” https://www.braintrust.dev/articles/best-llm-gateways-2026 how-we-chose-the-best-llm-gateways https://www.braintrust.dev/articles/best-llm-gateways-2026 how-we-chose-the-best-llm-gateways Share this article Related writing The Landscape of MITM Proxy and HTTP Interception Tools: A Comprehensive Survey of Projects Similar to mitmproxy and oproxy /2026/06/The-Landscape-of-MITM-Proxy-and-HTTP-Interception-Tools-A-Comprehensive-Survey-of-Projects-Similar-to-mitmproxy-and-oproxy/ A research report examining the ecosystem of man-in-the-middle proxy tools, HTTP debugging proxies, and network traffic interception frameworks — their architectures, capabilities, trade-offs, and positioning in the developer and security toolchain. This report maps the ecosystem of man-in-the-middle MITM proxy and HTTP interception tools, benchmarked against two reference projects:... Read article /2026/06/The-Landscape-of-MITM-Proxy-and-HTTP-Interception-Tools-A-Comprehensive-Survey-of-Projects-Similar-to-mitmproxy-and-oproxy/ The Anti-Scraper Ecosystem: A Comprehensive Survey of Open-Source Projects for Browser Stealth and Fingerprint Evasion /2026/06/The-Anti-Scraper-Ecosystem-A-Comprehensive-Survey-of-Open-Source-Projects-for-Browser-Stealth-and-Fingerprint-Evasion/ A deep survey of 40+ open-source projects for browser stealth, fingerprint evasion, and anti-bot detection — from C++ browser patches and CDP-minimal frameworks to Rust-native headless engines and AI-agent-integrated stealth browsers. This report catalogs and analyzes the complete landscape of open-source GitHub projects that provide anti-scraper detection, browser fingerprint... Read article /2026/06/The-Anti-Scraper-Ecosystem-A-Comprehensive-Survey-of-Open-Source-Projects-for-Browser-Stealth-and-Fingerprint-Evasion/ Claude Code: Features, Commands, Architecture and Best Practices /2026/05/Claude-Code-Features-Commands-Architecture-and-Best-Practices/ A comprehensive analysis of every feature from basic to advanced in Anthropic's Claude Code agentic coding environment. This report analyzes Claude Code's complete feature set, architecture, and best practices for effective usage. Below are the five most actionable... Read article /2026/05/Claude-Code-Features-Commands-Architecture-and-Best-Practices/ Search Search by title, subtitle, tags, categories, authors, or body text.