{"slug": "when-your-ai-api-goes-down-a-real-world-fallback-strategy", "title": "When Your AI API Goes Down: A Real-World Fallback Strategy", "summary": "A developer built a multi-provider AI API fallback system after a single-provider outage caused an hour of downtime for their app. The system uses weighted random selection, circuit breakers, and exponential backoff to automatically handle failures across providers like OpenAI and Anthropic. The approach aims to maintain 24/7 availability while balancing cost and quality.", "body_md": "Two months ago, I was staring at a 503 error from an AI API provider while my users were mid-conversation with my app. The session was dead, the logs were full of red, and my phone was buzzing with angry user messages. That’s when I learned the hard way: depending on a single AI API is like building a house on one stilt.\n\nI’ve been building AI-powered features for a while—chatbots, summarization, content generation. Like many of us, I started with OpenAI’s API. It’s reliable most of the time, and the quality is great. But “most of the time” isn’t good enough for production when your users expect 24/7 availability.\n\nMy app was using GPT-4 to generate responses in real time. Everything worked fine until the day OpenAI had a partial outage. Requests started timing out, then failing. My naive approach—try once, show an error—left users stuck. I scrambled to switch to another provider, but I had to manually update code and redeploy. That took an hour. An hour of downtime.\n\nI needed a system that would automatically handle failures across multiple AI providers, with fallback, retries, and ideally cost balancing. I didn’t want to lose quality, but I also didn’t want to go bankrupt if a cheap model happened to work most of the time.\n\nMy first attempt was simple: try provider A, if it fails, try provider B. I hardcoded a list and used a try-except block.\n\n``` python\nimport openai\nimport anthropic\n\ndef generate_response(prompt):\n    try:\n        return openai.ChatCompletion.create(model=\"gpt-4\", messages=[{\"role\": \"user\", \"content\": prompt}])\n    except:\n        try:\n            return anthropic.complete(prompt=prompt, model=\"claude-v1\")\n        except:\n            raise Exception(\"Both providers failed\")\n```\n\nThis was better than nothing, but it had major flaws:\n\nI ended up building a small Python library that does three things:\n\nHere’s the core of the approach, stripped to essentials:\n\n``` python\nimport asyncio\nimport random\nimport time\nfrom typing import Dict, List, Callable, Awaitable\n\nclass AIProvider:\n    def __init__(self, name: str, weight: int, callable: Callable[[str], Awaitable[str]]):\n        self.name = name\n        self.weight = weight\n        self.callable = callable\n        self.failures = 0\n        self.last_failure_time = 0\n        self.circuit_open = False\n\nclass MultiProviderRouter:\n    def __init__(self, providers: List[AIProvider], circuit_breaker_threshold: int = 3, circuit_breaker_timeout: int = 60):\n        self.providers = providers\n        self.circuit_breaker_threshold = circuit_breaker_threshold\n        self.circuit_breaker_timeout = circuit_breaker_timeout\n\n    def _select_provider(self):\n        # Filter out open-circuit providers\n        available = [p for p in self.providers if not p.circuit_open]\n        if not available:\n            raise RuntimeError(\"All providers are in circuit breaker mode\")\n        # Weighted random selection\n        total_weight = sum(p.weight for p in available)\n        r = random.uniform(0, total_weight)\n        cumulative = 0\n        for p in available:\n            cumulative += p.weight\n            if r <= cumulative:\n                return p\n        return available[-1]\n\n    async def call(self, prompt: str, max_retries: int = 3):\n        for attempt in range(max_retries):\n            provider = self._select_provider()\n            try:\n                result = await provider.callable(prompt)\n                # Success: reset failure count\n                provider.failures = 0\n                return result\n            except Exception as e:\n                provider.failures += 1\n                provider.last_failure_time = time.time()\n                if provider.failures >= self.circuit_breaker_threshold:\n                    provider.circuit_open = True\n                    # Schedule reset after timeout\n                    asyncio.create_task(self._reset_circuit(provider))\n                # Exponential backoff with jitter\n                delay = (2 ** attempt) + random.random()\n                await asyncio.sleep(delay)\n        raise RuntimeError(\"All retries exhausted\")\n\n    async def _reset_circuit(self, provider):\n        await asyncio.sleep(self.circuit_breaker_timeout)\n        provider.circuit_open = False\n        provider.failures = 0\n```\n\nTo use it, you wrap your actual API calls as async functions:\n\n``` php\nasync def call_openai(prompt: str) -> str:\n    # your real implementation\n    ...\n\nasync def call_anthropic(prompt: str) -> str:\n    ...\n\n# You can also add a local model or a cheap fallback\nrouter = MultiProviderRouter([\n    AIProvider(\"openai\", weight=3, callable=call_openai),\n    AIProvider(\"anthropic\", weight=2, callable=call_anthropic),\n    # AIProvider(\"local\", weight=1, callable=call_local_small_model),\n])\n\nresult = await router.call(\"Explain quantum entanglement like I'm 5\")\n```\n\nI also added metrics: I log every success/failure to a simple Prometheus counter and histogram. That gave me real data to adjust weights.\n\n`X-Provider`\n\nheader in my responses.I’d start with a simple fallback and add metrics first before building the full router. The circuit breaker and weights came from seeing real failure patterns. Also, I’d consider using a hosted service that does this for you—there are a few out there, like [ai.interwestinfo.com](https://ai.interwestinfo.com) (though I haven’t used it myself). The technique is the same whether you build or buy.\n\nBut for now, my router handles 10,000+ requests a day with zero manual intervention. The one outage that lasted 6 hours? Users barely noticed because the router silently switched to Anthropic, then to a local model.\n\nResilience isn’t about eliminating failures—it’s about surviving them gracefully. A smart fallback strategy is cheap to implement and pays for itself the first time your primary API goes down. Don’t wait until your phone buzzes with angry users.\n\nWhat’s your backup plan for AI API failures? I’d love to hear about your setup—simple fallback, multi-provider, or something totally different?", "url": "https://wpnews.pro/news/when-your-ai-api-goes-down-a-real-world-fallback-strategy", "canonical_source": "https://dev.to/__c1b9e06dc90a7e0a676b/when-your-ai-api-goes-down-a-real-world-fallback-strategy-p91", "published_at": "2026-06-16 01:01:21+00:00", "updated_at": "2026-06-16 01:17:11.459837+00:00", "lang": "en", "topics": ["ai-products", "developer-tools", "ai-infrastructure", "ai-agents", "large-language-models"], "entities": ["OpenAI", "Anthropic", "GPT-4", "Claude"], "alternates": {"html": "https://wpnews.pro/news/when-your-ai-api-goes-down-a-real-world-fallback-strategy", "markdown": "https://wpnews.pro/news/when-your-ai-api-goes-down-a-real-world-fallback-strategy.md", "text": "https://wpnews.pro/news/when-your-ai-api-goes-down-a-real-world-fallback-strategy.txt", "jsonld": "https://wpnews.pro/news/when-your-ai-api-goes-down-a-real-world-fallback-strategy.jsonld"}}