The 2025 AI model market is in full bloom. But each provider has its own API format, authentication method, and streaming protocol. BoxAgnts' design goal: users switch models by changing just one parameter, with all internal logic remaining unchanged.
This article dissects this abstraction across four levels:
LlmProvider
trait defines a "model provider"Everything starts with the interface definition:
// boxagnts-api/src/provider.rs
#[async_trait]
pub trait LlmProvider: Send + Sync {
fn id(&self) -> &ProviderId; // Unique identifier
fn name(&self) -> &str; // Human-readable name
async fn create_message( // Non-streaming request
&self,
request: ProviderRequest,
) -> Result<ProviderResponse, ProviderError>;
async fn create_message_stream( // Streaming request
&self,
request: ProviderRequest,
) -> Result<
Pin<Box<dyn Stream<Item = Result<StreamEvent, ProviderError>> + Send>>,
ProviderError,
>;
async fn list_models(&self) -> Result<Vec<ModelInfo>, ProviderError>; // Model list
async fn check_connectivity(&self) -> Result<ProviderStatus, ProviderError>; // Health check
fn capabilities(&self) -> ProviderCapabilities; // Capability declaration
}
Both input and output use provider-agnostic unified types:
pub struct ProviderRequest {
pub model: String,
pub messages: Vec<Message>, // Unified conversation format
pub system_prompt: Option<SystemPrompt>,
pub tools: Vec<ToolDefinition>, // Unified tool definitions
pub max_tokens: u32,
pub temperature: Option<f64>,
pub thinking: Option<ThinkingConfig>, // Deep thinking configuration
pub provider_options: Value, // Provider-specific parameters
}
pub struct ProviderResponse {
pub id: String,
pub content: Vec<ContentBlock>, // Unified content blocks
pub stop_reason: StopReason, // Unified stop reason
pub usage: UsageInfo, // Token usage
pub model: String,
}
The core value of the normalization layer: whether the underlying is Claude, GPT, or Gemini, upper-layer code only sees ProviderRequest and ProviderResponse.
// boxagnts-api/src/registry.rs
pub struct ProviderRegistry {
providers: HashMap<ProviderId, Arc<dyn LlmProvider>>,
default_provider_id: ProviderId,
}
fn provider_from_key(provider_id: &str, key: String) -> Option<Arc<dyn LlmProvider>> {
match provider_id {
// Native implementations — each with its own API format
"anthropic" => Some(Arc::new(AnthropicProvider::from_config(...))),
"openai" => Some(Arc::new(OpenAiProvider::new(key))),
"google" => Some(Arc::new(GoogleProvider::new(key))),
"github-copilot" => Some(Arc::new(CopilotProvider::new(key))),
"cohere" => Some(Arc::new(CohereProvider::new(key))),
// OpenAI-compatible providers — share the same conversion logic, only change base_url
"deepseek", "groq", "ollama", "mistral", "xai",
"perplexity", "openrouter", "siliconflow", "moonshot",
"zhipu", "stepfun", "fireworks", "llamacpp",
"sambanova", "huggingface", "nvidia", "cerebras",
// ... 30+ OpenAI-compatible providers in total
_ => None,
}
}
Three implementation strategies:
| Type | Representative | Conversion Strategy | Count |
|---|---|---|---|
| Native Anthropic | |||
| claude-sonnet-4-5 | Near-zero conversion (internal format = Anthropic format) | 1 | |
| Native OpenAI | |||
| gpt-4o, o3 | ProviderRequest → Chat Completions | 1 | |
| Native Google | |||
| gemini-2.5-flash | ProviderRequest → generateContent | 1 | |
| OpenAI Compatible | |||
| deepseek, groq, ollama, etc. | Same logic as OpenAI, only URL changes | 30+ | |
| Other Native | |||
| github-copilot, cohere | Independent format conversion | 3+ |
Anthropic, OpenAI, Google Gemini — three APIs with vast differences in message format. Understanding these differences is essential to understanding the value of the conversion layer.
| Feature | Anthropic | OpenAI | Google Gemini |
|---|---|---|---|
| Location | Top-level "system" field |
||
messages[0], role:"system" |
|||
Top-level "systemInstruction" field |
|||
| Type | string or ContentBlock array | string only | content parts array only |
// Anthropic — top-level standalone field
{"model": "claude-sonnet-4-5", "system": "You are helpful.", "messages": [...]}
// OpenAI — embedded in messages array
{"model": "gpt-4o", "messages": [{"role":"system","content":"You are helpful."}, ...]}
// Google — uses systemInstruction field, structure differs from messages
{
"systemInstruction": {"parts": [{"text": "You are helpful."}]},
"contents": [{"role": "user", "parts": [{"text": "Hello"}]}]
}
| Feature | Anthropic | OpenAI | |
|---|---|---|---|
| Field | "tools": [{name, description, input_schema}] |
||
"tools": [{type:"function", function:{...}}] |
|||
"tools": [{functionDeclarations: [{name, description, parameters}]}] |
|||
| Wrapping Layers | 0 | 1 | 1, with different nesting names |
// Anthropic — native block in content array
{"content": [{"type":"tool_use", "id":"toolu_01A", "name":"read", "input": {...}}]}
// OpenAI — standalone tool_calls array, arguments is JSON string
{"tool_calls": [{"id":"call_abc", "function": {"name":"read", "arguments": "{\"path\":\"...\"}"}}]}
// Google — functionCall embedded in parts, args is JSON object
{"candidates": [{"content": {"parts": [{"functionCall": {"name":"read", "args": {...}}}]}}]}
// Anthropic — tool_result is a block in the user message content array
{"role":"user", "content": [{"type":"tool_result", "tool_use_id":"toolu_01A", "content":"..."}]}
// OpenAI — requires a separate role: "tool" message
{"role":"tool", "tool_call_id":"call_abc", "content":"..."}
// Google — functionResponse embedded in user content parts
{"role":"user", "parts": [{"functionResponse": {"name":"read", "response": {...}}}]}
| Anthropic | OpenAI | |
|---|---|---|
user |
||
user |
||
user |
||
assistant |
||
assistant |
||
model |
Google uses model
instead of assistant
— this is the most easily overlooked but most error-prone difference.
OpenAiProvider
is the most complete example of the conversion layer:
// boxagnts-api/src/providers/openai.rs
impl OpenAiProvider {
fn to_openai_messages(
messages: &[Message],
system_prompt: Option<&SystemPrompt>,
) -> Vec<Value> {
let mut result: Vec<Value> = Vec::new();
// Step 1: system prompt → role: "system" message
if let Some(sys) = system_prompt {
result.push(json!({"role": "system", "content": sys_text}));
}
for msg in messages {
match msg.role {
Role::User => {
// User messages may mix text and tool_result blocks
// tool_result needs to be split into separate role: "tool" messages
Self::append_user_messages(&mut result, &msg.content);
}
Role::Assistant => {
let (text, tool_calls) = Self::assistant_content_to_openai(&msg.content);
result.push(json!({
"role": "assistant",
"content": text,
"tool_calls": tool_calls
}));
}
}
}
result
}
fn to_openai_tools(tools: &[ToolDefinition]) -> Vec<Value> {
tools.iter().map(|td| {
json!({
"type": "function",
"function": {
"name": td.name,
"description": td.description,
"parameters": td.input_schema
}
})
}).collect()
}
}
The most complex part is tool_use_id sanitization — Anthropic's tool IDs (e.g., toolu_01Bx...
) may contain characters that OpenAI does not accept.
GoogleProvider
shows how to handle an API format that is different from both Anthropic and OpenAI:
// boxagnts-api/src/providers/google.rs
// URL pattern completely different from OpenAI's /v1/chat/completions
fn generate_url(&self, model: &str) -> String {
format!(
"{}/v1beta/models/{}:generateContent?key={}",
self.base_url, model, self.api_key // API Key in URL query parameters!
)
}
Key differences from OpenAI:
| Difference | Google Gemini | OpenAI |
|---|---|---|
| API Key Location | URL query parameter ?key= |
|
HTTP Header Authorization: Bearer |
||
| Endpoint Format | /v1beta/models/{model}:generateContent |
|
/v1/chat/completions |
||
| Streaming Endpoint | /v1beta/models/{model}:streamGenerateContent?alt=sse |
|
/v1/chat/completions + stream:true |
||
| Message Roles | ||
user / model (not assistant) |
||
user / assistant |
||
| Tool Results | ||
functionResponse in parts |
||
Separate role: tool message |
||
| Image Input | ||
inlineData base64 |
||
image_url or content parts |
ThinkingConfig
is the normalized deep thinking configuration — but different providers handle it completely differently:
// Normalized configuration
pub struct ThinkingConfig {
pub budget_tokens: u32, // Thinking token budget
}
// When building ProviderRequest, decides whether to pass based on provider capabilities
let provider_request = ProviderRequest {
// ...
thinking: if caps.thinking {
effective_thinking_budget
.map(|b| ThinkingConfig::enabled(b))
} else {
None // This provider doesn't support thinking, don't pass
},
};
| Provider | Thinking Support | How It's Passed |
|---|---|---|
| Anthropic (Claude 3.5+) | ✓ | "thinking": {"type": "enabled", "budget_tokens": N} |
| Google (Gemini 2.5+) | ✓ | "thinkingConfig": {"thinkingBudget": N} |
| OpenAI (o1/o3 series) | Partial | Via reasoning_effort parameter |
| Other OpenAI Compatible | Mostly unsupported | Not passed |
At request construction time, ProviderCapabilities
declares each provider's capabilities:
pub struct ProviderCapabilities {
pub thinking: bool, // Whether deep thinking is supported
pub prompt_caching: bool, // Whether prompt caching is supported
pub image_input: bool, // Whether image input is supported
pub native_tool_use: bool, // Whether native tool calling exists
pub supports_streaming: bool, // Whether streaming responses are supported
// ...
}
OpenAI-compatible providers' APIs are roughly compatible, but all have subtle differences. ProviderQuirks
handles these:
pub struct ProviderQuirks {
/// Specific error message patterns for context overflow
pub overflow_patterns: Vec<String>,
/// Local services that don't require API Keys (e.g., Ollama, LM Studio)
pub no_api_key_required: bool,
/// Whether streaming responses include usage info
pub include_usage_in_stream: bool,
/// Providers like DeepSeek need the reasoning_content field
pub reasoning_field: Option<String>,
}
For example, DeepSeek's streaming response returns reasoning content with a field name different from OpenAI's — adapted via reasoning_field
. Ollama's context overflow error message is "exceeds the available context size"
, while LM Studio's is "greater than the context length"
— adapted via overflow_patterns
.
Streaming responses are also completely different across the three APIs:
| Feature | Anthropic (SSE) | OpenAI (SSE) | Google (SSE) |
|---|---|---|---|
| Event Granularity | High: 6 event types (start/delta/stop × 2) | Low: each chunk is a complete delta | Medium: pushed by chunk, but structure is flat |
| Tool call Increment | Fragmented send of input_json_delta |
||
Single send of complete arguments string |
|||
Single send of complete functionCall |
|||
| Termination Signal | |||
message_stop event |
|||
data: [DONE] marker |
|||
| Stream ends naturally | |||
| Need to Reassemble by index | Yes (reassemble by index for multiple tool_use) | Yes | Yes |
All three formats are normalized to the same StreamEvent
enum:
pub enum StreamEvent {
MessageStart { id, model, usage },
ContentBlockStart { index, content_block },
TextDelta { text },
ThinkingDelta { thinking },
InputJsonDelta { index, partial_json },
ContentBlockStop { index },
MessageDelta { stop_reason, usage },
MessageStop,
}
Each provider's error format is also different:
// Unified error types
pub enum ProviderError {
Auth { ... }, // Authentication failure
RateLimited { ... }, // Rate limiting
ContextOverflow { ... }, // Context exceeds window (matched via ProviderQuirks)
InvalidRequest { ... }, // Invalid request parameters
ServerError { ... }, // Server error
StreamError { ... }, // Stream interruption
Other { ... }, // Unknown error
}
In the query loop, specific errors trigger specific recovery strategies:
RateLimited / Overloaded → Switch to fallback_model
ContextOverflow → Trigger auto_compact
StreamError (stall) → Retry (max 2 times, 45s timeout)
Auth → Unrecoverable, return error
BoxAgnts defines environment variable name mappings for each provider:
// boxagnts-workspace/src/config.rs
pub fn api_key_env_vars_for_provider(provider_id: &str) -> &'static [&'static str] {
match provider_id {
"anthropic" => &["ANTHROPIC_API_KEY"],
"openai" => &["OPENAI_API_KEY"],
"google" => &["GOOGLE_API_KEY", "GOOGLE_GENERATIVE_AI_API_KEY"],
"deepseek" => &["DEEPSEEK_API_KEY"],
"mistral" => &["MISTRAL_API_KEY"],
"xai" => &["XAI_API_KEY"],
"zhipu" => &["ZHIPU_API_KEY"],
// ... 40+ provider environment variables
}
}
Three-tier priority: Environment Variables > User Config JSON > No Default
. This design supports different scenarios such as multi-tenancy, CI/CD, and local development.
BoxAgnts' model abstraction layer solves the essential problem of "one set of code adapting to all APIs":
┌──────────────────────────────────────────────┐
│ boxagnts-query (Agent reasoning loop) │
│ Only uses ProviderRequest / ProviderResponse │
└────────────────────┬─────────────────────────┘
│
┌────────────────────▼─────────────────────────┐
│ LlmProvider trait │
│ + ProviderRegistry (40+ providers) │
├──────────┬──────────┬──────────┬─────────────┤
│Anthropic │ OpenAI │ Google │ OpenAiCompat │
│Provider │ Provider │ Provider │ (30+ vendors)│
│(Near-zero│ (Full │ (Independent│ (Shares │
│ conversion)│ format │ format │ OpenAI │
│ │ conversion)│ conversion)│ conversion │
│ │ │ │ +Quirks) │
└──────────┴──────────┴──────────┴─────────────┘
Three key capabilities:
--model
parameterrun_query_loop()
has no idea what's underneathThis is not a simple "adapter pattern" — it's a production-grade abstraction validated against 40+ real-world APIs.