cd /news/large-language-models/fault-injecting-our-llm-provider-to-… · home topics large-language-models article
[ARTICLE · art-33997] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Fault-injecting our LLM provider to trust Bifrost fallbacks

Buildkite ran a game day that fault-injected OpenAI with 429s and 500s to test whether Bifrost's fallback config would reroute requests for an LLM-backed build-failure summariser. After fixing a retry ceiling and adding a request timeout, the gateway successfully rerouted to Anthropic's Claude Haiku, preventing any user-visible failures. The exercise demonstrated that slow responses, not just errors, must be treated as failures to trigger fallbacks.

read5 min views2 publishedJun 19, 2026

TL;DR: We run an LLM-backed build-failure summariser at Buildkite. To stop a provider wobble from breaking it mid-deploy, I ran a game day that fault-injected OpenAI with 429s and 500s and watched whether Bifrost's fallback config actually rerouted. It did, but only after I fixed two things I'd set up wrong.

We've got a small service that reads failed CI jobs and writes a one-paragraph summary into the build annotation, so engineers don't have to scroll 4,000 lines of test log to find the one assertion that broke. It calls an LLM. Handy when it works. Embarrassing when it doesn't, because a broken annotation makes people distrust every annotation.

The problem is the thing it depends on isn't ours. OpenAI rate-limits, has the occasional 5xx spell, and we don't get a heads-up. "Never had an outage" usually means you never tested the failure path. So I tested it.

I didn't want fallback logic smeared across our service code. Retry-with-jitter, secondary provider, key rotation, all of that wants to live in one place with metrics attached. We put Bifrost in front, an OpenAI-compatible gateway, so our service keeps talking the same /v1/chat/completions

it always did and the routing decisions move to config.

The pitch is plain. One endpoint, 23+ providers behind it, automatic fallbacks between them. Our code points at localhost:8080

instead of api.openai.com

and stops caring which model actually answers.

Here's the fallback config I started the game day with:

{
  "providers": {
    "openai": { "keys": ["env.OPENAI_KEY_A", "env.OPENAI_KEY_B"] },
    "anthropic": { "keys": ["env.ANTHROPIC_KEY"] }
  },
  "fallbacks": [
    "openai/gpt-4o-mini",
    "anthropic/claude-haiku-4-5"
  ]
}

Two OpenAI keys for load balancing, then Anthropic as the lifeboat if OpenAI as a whole goes sideways. That was the theory.

A game day is just a planned outage you cause on purpose, with people watching. I scheduled 45 minutes, told the team, and put a toxiproxy in front of OpenAI so I could inject faults without waiting for the real thing to break.

Three scenarios:

Scenario one went fine. Bifrost saw the 429s, rotated between key A and key B, then gave up on OpenAI and the requests landed on Haiku. Annotations kept writing. Reckoned I was done.

Scenario two found my first mistake. I'd not set a sane retry ceiling, so on a 503 the gateway retried hard against the same struggling provider before failing over, and our p95 on annotation writes jumped to about 18 seconds. Fixed it by capping retries and letting the fallback fire sooner. The README's retries and fallbacks page covers the knobs; I'd skimmed it the first time.

Scenario three is the one everyone gets wrong. Slow isn't down. A 30-second response isn't an error, so naive fallback never triggers, the request just sits there. We added a request timeout so a tar-pitted provider counts as a failure and trips the lifeboat. That single change is the actual reason this exercise was worth running.

Bifrost ships native Prometheus metrics, so I didn't have to bolt on my own. I watched fallback rate and per-provider latency the whole time on a Grafana board.

Scenario Without fallback With Bifrost (tuned)
429 storm annotations stall reroute to Haiku, ~2.1s p95
Hard 503s 50% writes fail 0 user-visible failures
30s latency every write hangs timeout trips fallback in 4s

The numbers that mattered: zero broken annotations across all three once tuned, and the fallback decisions were visible in metrics instead of buried in logs nobody reads.

I'd used LiteLLM before. Worth being honest here.

Bifrost LiteLLM Portkey
OpenAI-compatible endpoint yes yes yes
Automatic fallbacks yes yes yes
Native Prometheus metrics yes yes yes
Self-host story single Go binary Python proxy gateway is OSS, control plane hosted
Maturity / ecosystem newer large, lots of integrations polished dashboards

LiteLLM has been around longer and has a bigger pile of community integrations, which counts for something when you hit an edge case at 2am. Portkey's hosted dashboards are nicer than anything I'd build myself, and if you don't want to run infra that's a fair trade. We picked Bifrost mostly because a single Go binary is easy for an infra team to operate and the Prometheus output dropped straight into our existing board with no glue. Not a knock on the others. Different priorities.

A gateway is one more hop you have to keep alive. If Bifrost falls over, every LLM call falls with it, so we run two replicas behind a load balancer and the game day included killing one of them too.

Fallback to a different model means a different model. Haiku doesn't write the exact same summary as gpt-4o-mini, and for a build annotation that's fine, but if you depend on a strict output schema you need to test the lifeboat actually produces it. We caught one prompt that assumed OpenAI-specific formatting.

And fault injection in front of a proxy isn't the real provider misbehaving. Toxiproxy gives you 429s and delays, not the weird partial-stream failures you see in the wild. It's a model of the failure, not the failure. Better than nothing, not the whole story.

Semantic caching is on the roadmap for us, not load-bearing yet, so I'm not going to claim numbers I haven't measured.

── more in #large-language-models 4 stories · sorted by recency
── more on @buildkite 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/fault-injecting-our-…] indexed:0 read:5min 2026-06-19 ·