How I built a simple AI router to avoid vendor lock-in and costs

A developer built a simple AI router to avoid vendor lock-in and reduce costs by routing different tasks to the most appropriate AI model. The router uses a YAML config file to map tasks like question answering, image captioning, and summarization to specific providers and models, handling different SDKs and authentication. The solution eliminates manual API key swapping and reduces costs by using cheaper models for simpler tasks.

I've been working on a side project that needs AI for a few different tasks: answering user questions, generating image captions, and summarizing chat threads. At first, I just picked one provider OpenAI and called it a day. But after a month, two things became painfully clear: first, not every model is great at every task, and second, the bill was climbing fast because I was using GPT-4 for everything. So I did what any reasonable developer would do: I started swapping API keys by hand. I'd comment out one import and uncomment another, deploy, test, get frustrated, rinse and repeat. That worked for about a week before I decided I needed a proper solution. My project had three distinct AI needs: I was using one provider for all three, which meant I was either overpaying for simple tasks or getting low-quality results for complex ones. First, I tried a simple if-elif chain in every endpoint. That turned into spaghetti within hours. Then I tried a config file with model names, but I still had to handle different SDKs, authentication, and response formats manually. It was brittle and ugly. I also looked at some API aggregation services. They promised unified access but often introduced latency, added cost per call, or required me to trust their infrastructure with my keys. Not ideal for a small project where I wanted full control. I built a tiny Python class that acts as a router. It takes a task name, picks a provider and model from a config file, and handles the request. The key insight: I didn't need a full proxy — just a configurable dispatcher that I could plug into my existing code with minimal changes. Here's the core of it. First, the config file config/ai router.yaml : config/ai router.yaml routing: qa: provider: openai model: gpt-4 max tokens: 500 temperature: 0.2 captions: provider: anthropic model: claude-3-haiku-20240307 max tokens: 200 temperature: 0.7 summarize: provider: openai model: gpt-3.5-turbo max tokens: 1000 temperature: 0.3 Now the router class router.py : python import os import yaml from functools import lru cache class AIRouter: def init self, config path="config/ai router.yaml" : with open config path as f: self.config = yaml.safe load f 'routing' self. init providers def init providers self : Lazy import to avoid loading unused SDKs self.providers = {} if any cfg 'provider' == 'openai' for cfg in self.config.values : from openai import OpenAI self.providers 'openai' = OpenAI api key=os.environ 'OPENAI API KEY' if any cfg 'provider' == 'anthropic' for cfg in self.config.values : from anthropic import Anthropic self.providers 'anthropic' = Anthropic api key=os.environ 'ANTHROPIC API KEY' def complete self, task: str, prompt: str : cfg = self.config.get task if not cfg: raise ValueError f"Unknown task: {task}" provider = self.providers cfg 'provider' model = cfg 'model' if cfg 'provider' == 'openai': response = provider.chat.completions.create model=model, messages= {"role": "user", "content": prompt} , max tokens=cfg 'max tokens' , temperature=cfg 'temperature' return response.choices 0 .message.content elif cfg 'provider' == 'anthropic': response = provider.messages.create model=model, max tokens=cfg 'max tokens' , temperature=cfg 'temperature' , messages= {"role": "user", "content": prompt} return response.content 0 .text else: raise NotImplementedError f"Provider {cfg 'provider' } not implemented" Usage in my app is dead simple: python from router import AIRouter router = AIRouter In one endpoint: answer = router.complete 'qa', "What's the capital of France?" In another: caption = router.complete 'captions', "Describe this image: base64 data " I'll be honest: this isn't production-grade. Error handling is minimal. If a provider is down, the whole request fails. There's no retry logic or fallback. Also, the config is static — if I want to switch models mid-request, I'd need a different approach. But for my project, it solved the immediate pain: I can now route tasks to the most cost-effective model without touching code. I saved about 40% on API costs in the first month by sending captions to cheaper models. I'd add a fallback mechanism. For example, if gpt-4 fails, try gpt-3.5-turbo before erroring out. Also, I'd make the router async — most providers support async now, and it would fit better in a web framework like FastAPI. Another improvement: dynamic routing based on prompt length or complexity. For instance, if a Q&A prompt is short and simple, route it to a cheaper model automatically. If you don't want to build this yourself, there are services that do something similar. For instance, ai.interwestinfo.com offers a unified API with smart routing. But for my small project, rolling my own taught me a lot about each provider's quirks. It also gave me full control over the routing logic. I'm still iterating on this. Next up: adding streaming support and a simple latency monitor. What does your AI infrastructure look like? Are you using a single provider or something more flexible? I'd love to hear how others handle this.