Measuring AI Gateway Failover: 30 Days of Production Data

A 30-day production test at Nexus Labs compared three AI gateways—Bifrost, LiteLLM, and Portkey—measuring failover latency and overhead. Bifrost added 11ms p99 overhead with automatic provider fallback, while LiteLLM and Portkey showed different strengths in cost tracking and managed features. The study concluded that routing reliability, not model capability, is the primary challenge in production AI systems.

TL;DR: We measured failover latency across three AI gateways Bifrost, LiteLLM, Portkey during 30 days of production traffic at Nexus Labs. Bifrost added 11ms p99 overhead with automatic provider fallback. The model is the easy part. Routing it reliably is not. Our agent platform at Nexus Labs handles around 2.4M LLM requests per day. Half of those hit OpenAI, the rest spread across Anthropic, Bedrock, and Vertex. When OpenAI had its 4-hour incident on April 23, we lost 38 minutes of traffic before our homegrown retry logic gave up and rerouted. That hurt. So we replaced the retry layer. The actual problem Most gateway benchmarks measure throughput on a cold path with no failures. That tells you very little about production. What I care about: how long does it take for a request to recover when a provider returns 429 or 503? How much p99 latency does the gateway add when nothing is wrong? Our team of 9 engineers spent two weeks instrumenting three options. Same hardware c6i.4xlarge, 2 nodes behind an NLB . Same upstream credentials. Same request distribution sampled from our actual logs. Setup Each gateway sat between our agent service and four providers. We configured identical fallback chains: OpenAI primary, Anthropic secondary, Bedrock tertiary. Cache disabled. Rate limits set to mirror our prod allocation. Here's the Bifrost config we used: providers: openai: keys: - value: env.OPENAI API KEY weight: 1.0 anthropic: keys: - value: env.ANTHROPIC API KEY weight: 1.0 bedrock: keys: - value: env.AWS BEDROCK KEY weight: 1.0 fallbacks: - provider: openai model: gpt-4o fallback to: - provider: anthropic model: claude-sonnet-4 - provider: bedrock model: anthropic.claude-sonnet-4 Documented behavior is at https://docs.getbifrost.ai/features/retries-and-fallbacks https://docs.getbifrost.ai/features/retries-and-fallbacks . LiteLLM and Portkey have equivalent configs. Different YAML shape, same semantics. Results We ran 720 hours of mirrored traffic. Numbers below are from the actual logs, not synthetic load. | Gateway | p50 overhead | p99 overhead | Failover time provider down | Memory at 1k RPS | |---|---|---|---|---| | Bifrost | 3ms | 11ms | 180ms one retry + switch | 412 MB | | LiteLLM | 8ms | 41ms | 620ms | 890 MB | | Portkey self-hosted | 6ms | 29ms | 340ms | 650 MB | Bifrost is written in Go. LiteLLM is Python with FastAPI. That accounts for most of the gap on the hot path. Not all of it. Bifrost's fallback chain evaluates synchronously without re-queuing the request, which matters when you're already on retry attempt two. Portkey was solid but the self-hosted version lagged their managed offering in feature parity. LiteLLM's killer feature for our team was richer support for custom cost-tracking callbacks. We still use those for finance reporting. What we used Bifrost for Three things, specifically. Fallback routing. When OpenAI returns 429, the request goes to Anthropic with the equivalent model. Our agent code never knows. Docs at https://docs.getbifrost.ai/features/retries-and-fallbacks https://docs.getbifrost.ai/features/retries-and-fallbacks . Semantic caching. For our evaluation harness specifically. We replay 18,000 prompts against new model versions nightly. Cache hit rate is 73% because the evaluation suite asks the same questions repeatedly. That's around 13k requests we don't pay for each night. Reference: https://docs.getbifrost.ai/features/semantic-caching https://docs.getbifrost.ai/features/semantic-caching . Prometheus metrics. Native export. We already had a Prom stack. Five-minute integration. The default dashboards aren't great but the metrics themselves are useful. Reference: https://docs.getbifrost.ai/features/observability/default https://docs.getbifrost.ai/features/observability/default . What we did not use MCP gateway, governance, SSO. Our auth sits in front of the gateway, not inside it. The custom plugins interface looked interesting but we haven't needed one yet. Trade-offs and Limitations Bifrost is younger than LiteLLM. The provider list is wide 23+ but if you need a niche provider, check the docs first. The plugin interface is straightforward so you can add one yourself, but that's still work. The web UI is decent for initial setup, not where you want to be doing complex governance. Configure things in YAML and version them in git like anything else. If you're already deep in LiteLLM and using its callback ecosystem, migration cost is real. LiteLLM has more community integrations because it's been around longer. Portkey is also a fine choice if you want a managed control plane and don't want to operate a gateway yourself. Pick based on what your team will actually maintain. Last caveat. The numbers above are from our workload. Your traffic shape will differ. Run the test yourself before deciding. Further Reading - Bifrost retries and fallbacks: https://docs.getbifrost.ai/features/retries-and-fallbacks https://docs.getbifrost.ai/features/retries-and-fallbacks - Bifrost semantic caching: https://docs.getbifrost.ai/features/semantic-caching https://docs.getbifrost.ai/features/semantic-caching - Bifrost observability: https://docs.getbifrost.ai/features/observability/default https://docs.getbifrost.ai/features/observability/default - Bifrost provider configuration: https://docs.getbifrost.ai/quickstart/gateway/provider-configuration https://docs.getbifrost.ai/quickstart/gateway/provider-configuration - Bifrost source: https://github.com/maximhq/bifrost https://github.com/maximhq/bifrost The model is the easy part. Routing it under failure is the hard part. Spend the time on the boring infrastructure problem.