A few months ago I was building a personal project that needed to generate structured data from natural language. I started with OpenAI's GPT-4 because, well, everyone does. The code worked, the responses were great, and I thought I was done. Then Anthropic released Claude 3, and the benchmarks looked promising. I wanted to try it—just swap one model for another to compare quality and cost.
That turned into an entire weekend of refactoring.
Different SDKs. Different authentication. Different response objects. Even the way you handle streaming (or don't) changed completely. By the end I had a messy pile of if provider == "openai": ... elif provider == "anthropic": ...
blocks that made me feel like I'd written JavaScript in 2014.
I knew I couldn't be the only one dealing with this. Every week there's a new model or a new API. The idea of being locked into one provider felt both brittle and inefficient. So I set out to build a thin abstraction that would let me swap AI providers without rewriting my entire codebase.
My first instinct was to just use environment variables and conditionally import the right SDK. Something like this:
import os
provider = os.getenv("AI_PROVIDER", "openai")
if provider == "openai":
from openai import OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
elif provider == "anthropic":
from anthropic import Anthropic
client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
This worked... until I needed to call the API. The method signatures were completely different:
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
Different parameter names (messages
vs messages
, okay same—but max_tokens
vs max_tokens
? Actually Anthropic uses max_tokens
, OpenAI uses max_tokens
too. Wait, that's not the problem. The real pain is the response format: OpenAI returns response.choices[0].message.content
, Anthropic returns response.content[0].text
. Streaming is even more divergent.
I quickly realized that conditionally importing the client wasn't enough. I needed a unified interface.
I created a simple abstract base class that defines a standard way to send a prompt and get a response. Then I wrote one concrete implementation per provider. The rest of my code only ever talks to the abstract class.
Here's a stripped-down version (I removed error handling and streaming for clarity, but the same pattern applies):
from abc import ABC, abstractmethod
from dataclasses import dataclass
@dataclass
class AIResponse:
content: str
model: str
usage: dict | None = None
class AIProvider(ABC):
@abstractmethod
def complete(self, prompt: str, **kwargs) -> AIResponse:
pass
Then for OpenAI:
import openai
class OpenAIProvider(AIProvider):
def __init__(self, api_key: str, model: str = "gpt-4"):
self.client = openai.OpenAI(api_key=api_key)
self.model = model
def complete(self, prompt: str, **kwargs) -> AIResponse:
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
**kwargs
)
return AIResponse(
content=response.choices[0].message.content,
model=response.model,
usage=dict(response.usage) if response.usage else None
)
And for Anthropic:
import anthropic
class AnthropicProvider(AIProvider):
def __init__(self, api_key: str, model: str = "claude-3-haiku-20240307"):
self.client = anthropic.Anthropic(api_key=api_key)
self.model = model
def complete(self, prompt: str, **kwargs) -> AIResponse:
max_tokens = kwargs.pop("max_tokens", 1024)
response = self.client.messages.create(
model=self.model,
max_tokens=max_tokens,
messages=[{"role": "user", "content": prompt}],
**kwargs
)
return AIResponse(
content=response.content[0].text,
model=response.model,
usage=None # Anthropic doesn't return usage in the same way
)
Now I can use a factory function to pick the right provider at startup:
def create_provider(provider_name: str, api_key: str, model: str | None = None) -> AIProvider:
if provider_name == "openai":
return OpenAIProvider(api_key, model or "gpt-4")
elif provider_name == "anthropic":
return AnthropicProvider(api_key, model or "claude-3-haiku-20240307")
else:
raise ValueError(f"Unknown provider: {provider_name}")
provider = create_provider("anthropic", os.getenv("ANTHROPIC_API_KEY"))
response = provider.complete("Tell me a joke about Python.")
print(response.content)
That's it. My application code never touches openai
or anthropic
directly. If I want to try a new provider tomorrow, I just write a new class and add one line to create_provider
.
Let me be honest about the limitations. Not all models support the same features. OpenAI has function calling, Anthropic has tool use (similar but not identical). Streaming APIs differ wildly. Token limits vary. Some providers support system messages, others don't. If you try to abstract everything into a single interface, you either end up with a leaky abstraction or you have to support only the lowest common denominator.
My approach works fine for simple text generation tasks (chat, summarization, classification). But if you rely on advanced features like structured outputs with JSON mode or vision, you'll need to handle those separately—maybe by adding optional methods to the base class that providers can implement or raise NotImplementedError
.
Also, there's a cost side. Different providers charge differently, and you might want to route requests to the cheapest model for a given task. That's a whole other layer of complexity.
I'd look for existing libraries that solve this problem. There are some good ones out there, like litellm
or even langchain
(though langchain can be heavy). The product I found while researching—something called Interwest AI (https://ai.interwestinfo.com/)—actually%E2%80%94actually) provides a unified API for multiple models, which would have saved me the weekend of writing provider classes. But building it myself taught me how each SDK really works, which was valuable.
If I were starting fresh today, I'd probably use a lightweight wrapper library that normalizes the API, but still keep my own abstract class around in case I need to add a custom provider that the library doesn't support.
if/elif
chains.This pattern has saved me hours every time I explore a new model. My side project now has three providers configured, and I can switch between them with a single environment variable change.
What's your setup look like? Are you using a wrapper library, rolling your own, or just committing to one provider? I'd love to hear what works (or doesn't) for you.