I've been working on a side project that needs AI for a few different tasks: answering user questions, generating image captions, and summarizing chat threads. At first, I just picked one provider (OpenAI) and called it a day. But after a month, two things became painfully clear: first, not every model is great at every task, and second, the bill was climbing fast because I was using GPT-4 for everything.
So I did what any reasonable developer would do: I started swapping API keys by hand. I'd comment out one import and uncomment another, deploy, test, get frustrated, rinse and repeat. That worked for about a week before I decided I needed a proper solution.
My project had three distinct AI needs:
I was using one provider for all three, which meant I was either overpaying for simple tasks or getting low-quality results for complex ones.
First, I tried a simple if-elif
chain in every endpoint. That turned into spaghetti within hours. Then I tried a config file with model names, but I still had to handle different SDKs, authentication, and response formats manually. It was brittle and ugly.
I also looked at some API aggregation services. They promised unified access but often introduced latency, added cost per call, or required me to trust their infrastructure with my keys. Not ideal for a small project where I wanted full control.
I built a tiny Python class that acts as a router. It takes a task name, picks a provider and model from a config file, and handles the request. The key insight: I didn't need a full proxy — just a configurable dispatcher that I could plug into my existing code with minimal changes.
Here's the core of it. First, the config file (config/ai_router.yaml
):
routing:
qa:
provider: openai
model: gpt-4
max_tokens: 500
temperature: 0.2
captions:
provider: anthropic
model: claude-3-haiku-20240307
max_tokens: 200
temperature: 0.7
summarize:
provider: openai
model: gpt-3.5-turbo
max_tokens: 1000
temperature: 0.3
Now the router class (router.py
):
import os
import yaml
from functools import lru_cache
class AIRouter:
def __init__(self, config_path="config/ai_router.yaml"):
with open(config_path) as f:
self.config = yaml.safe_load(f)['routing']
self._init_providers()
def _init_providers(self):
self.providers = {}
if any(cfg['provider'] == 'openai' for cfg in self.config.values()):
from openai import OpenAI
self.providers['openai'] = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
if any(cfg['provider'] == 'anthropic' for cfg in self.config.values()):
from anthropic import Anthropic
self.providers['anthropic'] = Anthropic(api_key=os.environ['ANTHROPIC_API_KEY'])
def complete(self, task: str, prompt: str):
cfg = self.config.get(task)
if not cfg:
raise ValueError(f"Unknown task: {task}")
provider = self.providers[cfg['provider']]
model = cfg['model']
if cfg['provider'] == 'openai':
response = provider.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=cfg['max_tokens'],
temperature=cfg['temperature']
)
return response.choices[0].message.content
elif cfg['provider'] == 'anthropic':
response = provider.messages.create(
model=model,
max_tokens=cfg['max_tokens'],
temperature=cfg['temperature'],
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
else:
raise NotImplementedError(f"Provider {cfg['provider']} not implemented")
Usage in my app is dead simple:
from router import AIRouter
router = AIRouter()
answer = router.complete('qa', "What's the capital of France?")
caption = router.complete('captions', "Describe this image: [base64 data]")
I'll be honest: this isn't production-grade. Error handling is minimal. If a provider is down, the whole request fails. There's no retry logic or fallback. Also, the config is static — if I want to switch models mid-request, I'd need a different approach.
But for my project, it solved the immediate pain: I can now route tasks to the most cost-effective model without touching code. I saved about 40% on API costs in the first month by sending captions to cheaper models.
I'd add a fallback mechanism. For example, if gpt-4
fails, try gpt-3.5-turbo
before erroring out. Also, I'd make the router async — most providers support async now, and it would fit better in a web framework like FastAPI.
Another improvement: dynamic routing based on prompt length or complexity. For instance, if a Q&A prompt is short and simple, route it to a cheaper model automatically.
If you don't want to build this yourself, there are services that do something similar. For instance, ai.interwestinfo.com
offers a unified API with smart routing. But for my small project, rolling my own taught me a lot about each provider's quirks. It also gave me full control over the routing logic.
I'm still iterating on this. Next up: adding streaming support and a simple latency monitor.
What does your AI infrastructure look like? Are you using a single provider or something more flexible? I'd love to hear how others handle this.