{"slug": "boxagnts-tool-system-6-multi-provider-adaptation-and-the-agent-query-loop", "title": "BoxAgnts Tool System (6) — Multi-Provider Adaptation and the Agent Query Loop", "summary": "BoxAgnts has developed a multi-provider adaptation layer and agent query loop to solve API incompatibility across AI vendors. The system introduces three abstraction layers—unified data models, an LlmProvider trait, and format transformers—to normalize requests and responses from Anthropic, OpenAI, and Google Gemini. A stream parser module further unifies SSE event handling, enabling seamless tool execution orchestration.", "body_md": "BoxAgnts' tool system, from the bottom-level WASM sandbox to the top-level Tool trait, has solved \"how tools run safely.\" But tools ultimately need to be called by AI models — which introduces two engineering problems: the complete incompatibility of API formats across AI vendors, and the interleaved orchestration of conversation flow and tool execution. These two problems are solved by the Provider abstraction layer and the Agent query loop, respectively.\n\nDifferent types of AI model APIs differ significantly in request format, response format, and error handling.\n\nLet's start with the request side. Anthropic splits roles into `user`\n\nand `assistant`\n\n, with the system prompt as an independent top-level `system`\n\nfield; OpenAI treats the system prompt as a `role: \"system\"`\n\nmessage; Google Gemini places `system_instruction`\n\nat the top level of the request body but with yet another format. If the upper-layer Agent loop had to handle these differences directly, the code would become a giant `match provider_id { ... }`\n\nbranch.\n\nBoxAgnts' solution introduces three layers of abstraction:\n\n**Layer 1: ProviderRequest / ProviderResponse Unified Data Model**\n\n```\n// provider_types.rs\npub struct ProviderRequest {\n    pub messages: Vec<ApiMessage>,\n    pub system: Option<String>,\n    pub tools: Vec<ApiToolDefinition>,\n    pub max_tokens: u32,\n    pub temperature: Option<f32>,\n}\n\npub struct ProviderResponse {\n    pub content: Vec<ContentBlock>,\n    pub usage: UsageInfo,\n    pub stop_reason: String,\n}\n```\n\nThe Agent loop only deals with these two structures, never needing to know whether the user has configured Anthropic or OpenAI.\n\n**Layer 2: LlmProvider trait**\n\n``` php\npub trait LlmProvider: Send + Sync {\n    fn id(&self) -> &ProviderId;\n    async fn create_message_stream(\n        &self, request: ProviderRequest\n    ) -> Result<Pin<Box<dyn Stream<Item = Result<StreamEvent, ProviderError>> + Send>>>;\n    async fn list_models(&self) -> Result<Vec<ModelInfo>>;\n}\n```\n\n`create_message_stream`\n\nreturns a `Pin<Box<dyn Stream>>`\n\n— the standard idiom in Rust's async ecosystem for unifying multiple stream types (analogous to Java's `Stream<T>`\n\nor Python's `AsyncIterator`\n\n). Each Provider implementation internally handles its own HTTP request construction, authentication, and SSE parsing, exposing a unified `StreamEvent`\n\nexternally.\n\n**Layer 3: Transformer (Message Format Conversion)**\n\nTransformers handle the \"last mile\" of eliminating vendor format differences:\n\n```\n// transformers/anthropic.rs\npub fn to_anthropic_request(req: &ProviderRequest) -> AnthropicMessagesRequest { ... }\n\n// transformers/openai_chat.rs\npub fn to_openai_request(req: &ProviderRequest) -> OpenAIChatRequest { ... }\n```\n\nTransformers are pure functions — unified format in, vendor format out. Adding a new Provider only requires implementing a new Transformer and corresponding `LlmProvider`\n\nimplementation. The shared `ProviderRegistry`\n\nlooks up implementations by Provider ID:\n\n```\npub struct ProviderRegistry {\n    providers: HashMap<ProviderId, Arc<dyn LlmProvider>>,\n    default_provider_id: ProviderId,\n}\n```\n\nAll Providers' streaming interactions rely on SSE (Server-Sent Events). But each vendor's SSE event granularity and semantics differ:\n\n`content_block_start`\n\n/ `content_block_delta`\n\n/ `content_block_stop`\n\nform a three-level event hierarchy; a single ContentBlock spans multiple SSE messages from start to stop`choices[0].delta`\n\nis a flat delta with no explicit block start/stopBoxAgnts' `stream_parser`\n\nmodule digests all these differences and exposes a unified `StreamEvent`\n\nenum:\n\n```\npub enum StreamEvent {\n    TextDelta { text: String },\n    ToolUseStart { id: String, name: String },\n    ToolUseDelta { id: String, json: String },\n    ToolUseEnd { id: String },\n    ThinkingDelta { text: String },\n    UsageUpdate { input_tokens: u32, output_tokens: u32 },\n    MessageStop,\n}\n```\n\nEach Provider's stream parser internally is a finite state machine. Taking Anthropic as an example:\n\n```\nWait for message_start\n  │\n  ├── message_start ──► extract model, initial usage\n  │\n  ├── content_block_start\n  │     │ type = \"text\"        → create TextBlock state\n  │     │ type = \"tool_use\"    → create ToolUseBlock state, emit ToolUseStart\n  │     │ type = \"thinking\"    → create ThinkingBlock state\n  │\n  ├── content_block_delta\n  │     │ text_delta           → append to current TextBlock, emit TextDelta\n  │     │ input_json_delta     → concatenate JSON fragment to ToolUseBlock, emit ToolUseDelta\n  │     │ thinking_delta       → append to ThinkingBlock, emit ThinkingDelta\n  │\n  ├── content_block_stop\n  │     │ corresponding tool_use block → emit ToolUseEnd\n  │\n  └── message_stop ──► emit MessageStop, accumulate final usage\n```\n\n`StreamAccumulator`\n\nmaintains the state of all ContentBlocks in the current message:\n\n```\npub struct StreamAccumulator {\n    text_blocks: Vec<TextBlock>,\n    tool_use_blocks: HashMap<String, ToolUseBlock>,\n    thinking_block: Option<String>,\n    usage: UsageInfo,\n}\n```\n\nWhen `MessageStop`\n\narrives, `finish()`\n\nassembles all accumulated blocks into a complete `Message`\n\n, returning `stop_reason`\n\nand final `UsageInfo`\n\n.\n\nThe stream parser has converted SSE events into structured `Message`\n\n. Next, `query::run_query_loop()`\n\nhands this Message to the tool system.\n\nCore flow:\n\n```\nloop {\n    // 1. Send message history + system Prompt + tool list to the AI model\n    let request = CreateMessageRequest::builder(model, max_tokens)\n        .messages(messages)\n        .tools(all_tools_as_definitions(tools))\n        .build();\n\n    // 2. Initiate streaming request, parse SSE events\n    let mut rx = client.create_message_stream(request).await?;\n    let mut acc = StreamAccumulator::new();\n\n    while let Some(evt) = rx.recv().await {\n        acc.on_event(&evt);\n        match evt {\n            StreamEvent::ToolUseStart { .. } | StreamEvent::ToolUseDelta { .. } => {\n                // Send to frontend in real time (via WebSocket) so users can see what tools the model is using\n            }\n            StreamEvent::MessageStop => break,\n            _ => {}\n        }\n    }\n\n    // 3. Assemble the completed Message, check stop_reason\n    let (msg, usage, stop_reason) = acc.finish();\n\n    match stop_reason {\n        \"end_turn\" => return QueryOutcome::EndTurn { message: msg, usage },\n        \"tool_use\" => {\n            // 4. For each tool_use ContentBlock, call the corresponding tool\n            for block in msg.content.iter() {\n                if let ContentBlock::ToolUse { name, input, .. } = block {\n                    let tool = find_tool(tools, name);\n                    let result = tool.execute(input, &ctx).await;\n                    messages.push(result_to_message(result));\n                }\n            }\n            // Return to loop top, continue to next round\n        }\n        \"max_tokens\" => {\n            // 5. MaxTokens recovery: inject a hint message so the model can continue\n            messages.push(UserMessage(\"Output token limit hit. Resume directly.\"));\n            max_tokens_count += 1;\n            if max_tokens_count > 3 { return MaxTokens { ... }; }\n        }\n        _ => return Error(...),\n    }\n\n    turn += 1;\n    if turn >= config.max_turns { break; }\n}\n```\n\nSeveral details worth noting:\n\n**Tool list injection strategy**. Each API call round sends the complete tool list (all tools' name, description, and input_schema) as the `tools`\n\nfield to the AI model. This incurs a fixed token overhead — the more tools, the higher the per-round \"tool description tokens.\" When tools exceed 20, this overhead becomes significant (potentially several thousand tokens/round). BoxAgnts' current strategy is full injection; future consideration includes tool selection and grouping mechanisms (similar to Anthropic's `tool_choice`\n\n).\n\n**MaxTokens recovery**. If a model exhausts its output token limit mid-response, it hasn't truly \"failed\" — it just hasn't finished speaking. BoxAgnts automatically injects a recovery message (\"Output token limit hit. Resume directly...\") to let the model continue. This loop executes at most 3 times — if after 3 attempts max_tokens is still hit, the task is genuinely too long; the system gives up and returns partial results.\n\n**Cancellation mechanism**. `CancellationToken`\n\nis borrowed from the tokio ecosystem. When the user clicks the \"Stop\" button in the frontend, the WebSocket handler cancels the corresponding token, and `run_query_loop`\n\nreturns `QueryOutcome::Cancelled`\n\nat its next check.\n\n**Cost tracking**. After each API call round, `CostTracker`\n\naccumulates the current model's pricing (separately priced by input/output token; different models have different prices). If cumulative costs exceed `budget_limit_usd`\n\n, `QueryOutcome::BudgetExceeded`\n\nis returned. Cost information is pushed in real time to the frontend Dashboard via WebSocket.\n\nAI API calls have several typical failure modes:\n\n| Error Type | Typical HTTP Code | Strategy |\n|---|---|---|\n| Rate Limit | 429 | Exponential backoff retry, respect Retry-After header |\n| Overloaded | 529 | Exponential backoff retry, optional fallback model |\n| Auth Failure | 401/403 | No retry, return error immediately |\n| Bad Request | 400 | No retry (retrying parameter errors is pointless) |\n| Server Error | 500+ | Limited retry (max 3 times) |\n| Network Timeout | — | Limited retry |\n\nExponential backoff uses intervals of 1s → 2s → 4s → 8s, multiplying on `Duration`\n\n. For 529 (Overloaded), model switching is additionally supported — if the user has configured a fallback model (e.g., `claude-sonnet-4-5`\n\noverloaded, switching to `claude-haiku-4-5`\n\n), subsequent calls automatically use the fallback.\n\nThe steps for adding a new Provider are clear:\n\n`providers/`\n\n, implement the `LlmProvider`\n\ntrait`registry.rs`\n\n's `provider_from_key()`\n\n`model_registry.rs`\n\nThe `openai_compat_providers`\n\nmodule is a shortcut: for services using the OpenAI API format (DeepSeek, OpenCode, various domestic models), only API base URL and API key configuration is needed — no Provider code needs to be written. These services share the same OpenAI-compatible SSE parser and Request builder; only the configuration differs.\n\n```\n// Configuration example\n\"deepseek\": {\n    \"provider_id\": \"deepseek\",\n    \"api_base\": \"https://api.deepseek.com/v1\",\n    \"api_key\": \"sk-...\",\n    \"provider_type\": \"openai_compat\"\n}\n```\n\nThe Provider abstraction and Agent query loop constitute BoxAgnts' tool system \"engine\":\n\n**Provider abstraction** solves the problem of integrating 12 AI APIs through three-layer decoupling (ProviderRequest/Response unified data model → LlmProvider trait → Transformer format conversion). Adding a new Provider requires only implementing the trait + registration; the shared SSE parser and Request builder further reduce integration costs through the `openai_compat`\n\nmodule.\n\n**Agent query loop** achieves interleaved orchestration of conversation and tool execution through a closed loop of SSE state machine parsing, ToolUse detection, tool dispatch, and result feedback. MaxTokens automatic recovery (max 3 attempts) and exponential backoff retry strategy ensure reliability for long tasks.\n\nThe common feature of these two layers is **dependency inversion** — the Agent loop doesn't depend on a specific AI vendor, and the Provider implementation doesn't depend on specific conversation orchestration logic. All coupling is decoupled through trait interfaces.\n\nCost tracking (`CostTracker`\n\n+ `AtomicF64`\n\n) and cancellation mechanism (`CancellationToken`\n\n) provide necessary operational observability and user control for production environments.", "url": "https://wpnews.pro/news/boxagnts-tool-system-6-multi-provider-adaptation-and-the-agent-query-loop", "canonical_source": "https://dev.to/guyoung/boxagnts-tool-system-6-multi-provider-adaptation-and-the-agent-query-loop-44kj", "published_at": "2026-06-13 07:39:44+00:00", "updated_at": "2026-06-13 07:47:32.309332+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-infrastructure", "artificial-intelligence", "large-language-models"], "entities": ["BoxAgnts", "Anthropic", "OpenAI", "Google Gemini", "ProviderRegistry", "LlmProvider", "StreamEvent"], "alternates": {"html": "https://wpnews.pro/news/boxagnts-tool-system-6-multi-provider-adaptation-and-the-agent-query-loop", "markdown": "https://wpnews.pro/news/boxagnts-tool-system-6-multi-provider-adaptation-and-the-agent-query-loop.md", "text": "https://wpnews.pro/news/boxagnts-tool-system-6-multi-provider-adaptation-and-the-agent-query-loop.txt", "jsonld": "https://wpnews.pro/news/boxagnts-tool-system-6-multi-provider-adaptation-and-the-agent-query-loop.jsonld"}}