Two months ago, I was staring at a 503 error from an AI API provider while my users were mid-conversation with my app. The session was dead, the logs were full of red, and my phone was buzzing with angry user messages. That’s when I learned the hard way: depending on a single AI API is like building a house on one stilt.
I’ve been building AI-powered features for a while—chatbots, summarization, content generation. Like many of us, I started with OpenAI’s API. It’s reliable most of the time, and the quality is great. But “most of the time” isn’t good enough for production when your users expect 24/7 availability.
My app was using GPT-4 to generate responses in real time. Everything worked fine until the day OpenAI had a partial outage. Requests started timing out, then failing. My naive approach—try once, show an error—left users stuck. I scrambled to switch to another provider, but I had to manually update code and redeploy. That took an hour. An hour of downtime.
I needed a system that would automatically handle failures across multiple AI providers, with fallback, retries, and ideally cost balancing. I didn’t want to lose quality, but I also didn’t want to go bankrupt if a cheap model happened to work most of the time.
My first attempt was simple: try provider A, if it fails, try provider B. I hardcoded a list and used a try-except block.
import openai
import anthropic
def generate_response(prompt):
try:
return openai.ChatCompletion.create(model="gpt-4", messages=[{"role": "user", "content": prompt}])
except:
try:
return anthropic.complete(prompt=prompt, model="claude-v1")
except:
raise Exception("Both providers failed")
This was better than nothing, but it had major flaws:
I ended up building a small Python library that does three things:
Here’s the core of the approach, stripped to essentials:
import asyncio
import random
import time
from typing import Dict, List, Callable, Awaitable
class AIProvider:
def __init__(self, name: str, weight: int, callable: Callable[[str], Awaitable[str]]):
self.name = name
self.weight = weight
self.callable = callable
self.failures = 0
self.last_failure_time = 0
self.circuit_open = False
class MultiProviderRouter:
def __init__(self, providers: List[AIProvider], circuit_breaker_threshold: int = 3, circuit_breaker_timeout: int = 60):
self.providers = providers
self.circuit_breaker_threshold = circuit_breaker_threshold
self.circuit_breaker_timeout = circuit_breaker_timeout
def _select_provider(self):
available = [p for p in self.providers if not p.circuit_open]
if not available:
raise RuntimeError("All providers are in circuit breaker mode")
total_weight = sum(p.weight for p in available)
r = random.uniform(0, total_weight)
cumulative = 0
for p in available:
cumulative += p.weight
if r <= cumulative:
return p
return available[-1]
async def call(self, prompt: str, max_retries: int = 3):
for attempt in range(max_retries):
provider = self._select_provider()
try:
result = await provider.callable(prompt)
provider.failures = 0
return result
except Exception as e:
provider.failures += 1
provider.last_failure_time = time.time()
if provider.failures >= self.circuit_breaker_threshold:
provider.circuit_open = True
asyncio.create_task(self._reset_circuit(provider))
delay = (2 ** attempt) + random.random()
await asyncio.sleep(delay)
raise RuntimeError("All retries exhausted")
async def _reset_circuit(self, provider):
await asyncio.sleep(self.circuit_breaker_timeout)
provider.circuit_open = False
provider.failures = 0
To use it, you wrap your actual API calls as async functions:
async def call_openai(prompt: str) -> str:
...
async def call_anthropic(prompt: str) -> str:
...
router = MultiProviderRouter([
AIProvider("openai", weight=3, callable=call_openai),
AIProvider("anthropic", weight=2, callable=call_anthropic),
])
result = await router.call("Explain quantum entanglement like I'm 5")
I also added metrics: I log every success/failure to a simple Prometheus counter and histogram. That gave me real data to adjust weights.
X-Provider
header in my responses.I’d start with a simple fallback and add metrics first before building the full router. The circuit breaker and weights came from seeing real failure patterns. Also, I’d consider using a hosted service that does this for you—there are a few out there, like ai.interwestinfo.com (though I haven’t used it myself). The technique is the same whether you build or buy.
But for now, my router handles 10,000+ requests a day with zero manual intervention. The one outage that lasted 6 hours? Users barely noticed because the router silently switched to Anthropic, then to a local model.
Resilience isn’t about eliminating failures—it’s about surviving them gracefully. A smart fallback strategy is cheap to implement and pays for itself the first time your primary API goes down. Don’t wait until your phone buzzes with angry users.
What’s your backup plan for AI API failures? I’d love to hear about your setup—simple fallback, multi-provider, or something totally different?