When Your AI API Goes Down: A Real-World Fallback Strategy

A developer built a multi-provider AI API fallback system after a single-provider outage caused an hour of downtime for their app. The system uses weighted random selection, circuit breakers, and exponential backoff to automatically handle failures across providers like OpenAI and Anthropic. The approach aims to maintain 24/7 availability while balancing cost and quality.

Two months ago, I was staring at a 503 error from an AI API provider while my users were mid-conversation with my app. The session was dead, the logs were full of red, and my phone was buzzing with angry user messages. That’s when I learned the hard way: depending on a single AI API is like building a house on one stilt. I’ve been building AI-powered features for a while—chatbots, summarization, content generation. Like many of us, I started with OpenAI’s API. It’s reliable most of the time, and the quality is great. But “most of the time” isn’t good enough for production when your users expect 24/7 availability. My app was using GPT-4 to generate responses in real time. Everything worked fine until the day OpenAI had a partial outage. Requests started timing out, then failing. My naive approach—try once, show an error—left users stuck. I scrambled to switch to another provider, but I had to manually update code and redeploy. That took an hour. An hour of downtime. I needed a system that would automatically handle failures across multiple AI providers, with fallback, retries, and ideally cost balancing. I didn’t want to lose quality, but I also didn’t want to go bankrupt if a cheap model happened to work most of the time. My first attempt was simple: try provider A, if it fails, try provider B. I hardcoded a list and used a try-except block. python import openai import anthropic def generate response prompt : try: return openai.ChatCompletion.create model="gpt-4", messages= {"role": "user", "content": prompt} except: try: return anthropic.complete prompt=prompt, model="claude-v1" except: raise Exception "Both providers failed" This was better than nothing, but it had major flaws: I ended up building a small Python library that does three things: Here’s the core of the approach, stripped to essentials: python import asyncio import random import time from typing import Dict, List, Callable, Awaitable class AIProvider: def init self, name: str, weight: int, callable: Callable str , Awaitable str : self.name = name self.weight = weight self.callable = callable self.failures = 0 self.last failure time = 0 self.circuit open = False class MultiProviderRouter: def init self, providers: List AIProvider , circuit breaker threshold: int = 3, circuit breaker timeout: int = 60 : self.providers = providers self.circuit breaker threshold = circuit breaker threshold self.circuit breaker timeout = circuit breaker timeout def select provider self : Filter out open-circuit providers available = p for p in self.providers if not p.circuit open if not available: raise RuntimeError "All providers are in circuit breaker mode" Weighted random selection total weight = sum p.weight for p in available r = random.uniform 0, total weight cumulative = 0 for p in available: cumulative += p.weight if r <= cumulative: return p return available -1 async def call self, prompt: str, max retries: int = 3 : for attempt in range max retries : provider = self. select provider try: result = await provider.callable prompt Success: reset failure count provider.failures = 0 return result except Exception as e: provider.failures += 1 provider.last failure time = time.time if provider.failures = self.circuit breaker threshold: provider.circuit open = True Schedule reset after timeout asyncio.create task self. reset circuit provider Exponential backoff with jitter delay = 2 attempt + random.random await asyncio.sleep delay raise RuntimeError "All retries exhausted" async def reset circuit self, provider : await asyncio.sleep self.circuit breaker timeout provider.circuit open = False provider.failures = 0 To use it, you wrap your actual API calls as async functions: php async def call openai prompt: str - str: your real implementation ... async def call anthropic prompt: str - str: ... You can also add a local model or a cheap fallback router = MultiProviderRouter AIProvider "openai", weight=3, callable=call openai , AIProvider "anthropic", weight=2, callable=call anthropic , AIProvider "local", weight=1, callable=call local small model , result = await router.call "Explain quantum entanglement like I'm 5" I also added metrics: I log every success/failure to a simple Prometheus counter and histogram. That gave me real data to adjust weights. X-Provider header in my responses.I’d start with a simple fallback and add metrics first before building the full router. The circuit breaker and weights came from seeing real failure patterns. Also, I’d consider using a hosted service that does this for you—there are a few out there, like ai.interwestinfo.com https://ai.interwestinfo.com though I haven’t used it myself . The technique is the same whether you build or buy. But for now, my router handles 10,000+ requests a day with zero manual intervention. The one outage that lasted 6 hours? Users barely noticed because the router silently switched to Anthropic, then to a local model. Resilience isn’t about eliminating failures—it’s about surviving them gracefully. A smart fallback strategy is cheap to implement and pays for itself the first time your primary API goes down. Don’t wait until your phone buzzes with angry users. What’s your backup plan for AI API failures? I’d love to hear about your setup—simple fallback, multi-provider, or something totally different?