BoxAgnts Introduction (2) — AI Agent Toolbox

wpnews.pro

BoxAgnts' middle layer — the Agent Toolbox — is the brain and hands of the system. It consists of six core modules responsible for three things: understanding your intent, dispatching the right tools, and feeding back execution results. This article takes a deep dive into the architectural design and key implementations of each module.

What happens when you type "Help me analyze the code structure of this Rust project" in the Dashboard and hit send?

User Message
  │
  ▼
┌─────────────────────────────────────────────────────────────┐
│  boxagnts-api            Unified API Abstraction Layer      │
│  LlmProvider trait → 20+ Providers → Message Normalization  │
├─────────────────────────────────────────────────────────────┤
│  boxagnts-query          Agent Query Loop                   │
│  run_query_loop() → Multi-turn Conversation → Tool Dispatch → Auto Recovery │
├─────────────────────────────────────────────────────────────┤
│  boxagnts-tools + tools-manager + wasm-tools                │
│  Tool trait → Built-in Tools + WASM Tools → Execution       │
├─────────────────────────────────────────────────────────────┤
│  boxagnts-gateway        Gateway & Scheduling               │
│  Cron Scheduler + Site Hosting                              │
├─────────────────────────────────────────────────────────────┤
│  boxagnts-workspace      Memory & Configuration             │
│  SQLite + JSON Config + Conversation History                │
└─────────────────────────────────────────────────────────────┘

Let's break down each one.

This is the interface layer between the middle layer and the external AI world. It solves the most painful problem in AI tool development: every model provider's API is different, but your code should not pay the price for that.

LlmProvider

Trait: The Foundation of Polymorphism The core interface that all provider adapters must implement:

#[async_trait]
pub trait LlmProvider: Send + Sync {
    fn id(&self) -> &ProviderId;       // Unique identifier "anthropic", "openai"
    fn name(&self) -> &str;            // Human-readable name

    // Non-streaming request
    async fn create_message(&self, request: ProviderRequest)
        -> Result<ProviderResponse, ProviderError>;

    // Streaming request (returns Pin<Box<dyn Stream>>)
    async fn create_message_stream(&self, request: ProviderRequest)
        -> Result<Pin<Box<dyn Stream<Item = Result<StreamEvent, ProviderError>> + Send>>, ProviderError>;

    // List available models
    async fn list_models(&self) -> Result<Vec<ModelInfo>, ProviderError>;
}

This trait design has three elegant aspects:

async_trait

macro, compatible with the Tokio async runtimeProviderError

BoxAgnts supports an extremely wide range of model providers:

Category	Providers	Independent Implementation File
International Mainstream
OpenAI, Anthropic, Google, Azure, Bedrock	Individual files
Open-Source Compatible
Deepseek, Mistral, Groq, TogetherAI, Fireworks	openai_compat.rs
Enterprise Services
Copilot, CodeX, Cohere, Perplexity	Individual files
Domestic Platforms
MiniMax, Alibaba Cloud (Qwen), Zhipu, Moonshot, SiliconFlow	Individual files
Others
Venus, Nebius, Novita, OVHCloud	Individual files

Key design pattern — Provider + Transformer dual-layer architecture:

Raw User Message
    │
    ▼
┌────────────────┐
│  Transformer   │  ← Converts internal message format to provider-specific format
│  (per-provider)│
└───────┬────────┘
        ▼
┌────────────────┐
│   Provider     │  ← Handles authentication, HTTP requests, stream parsing
│  (per-provider)│
└───────┬────────┘
        ▼
    AI Response
        │
        ▼
┌────────────────┐
│  Transformer   │  ← Converts provider response back to internal unified format
└────────────────┘

QueryConfig

contains a provider_registry

field that allows dynamic provider selection at runtime. This means you can:

fallback_model

to automatically switch to a backup model when the primary model is overloadedModelRegistry

BoxAgnts predefines environment variable mappings for each provider:

pub fn api_key_env_vars_for_provider(provider_id: &str) -> &'static [&'static str] {
    match provider_id {
        "anthropic" => &["ANTHROPIC_API_KEY"],
        "openai"    => &["OPENAI_API_KEY"],
        "deepseek"  => &["DEEPSEEK_API_KEY"],
        "zhipu"     => &["ZHIPU_API_KEY"],
        "minimax"   => &["MINIMAX_API_KEY"],
        // ... 30+ providers
    }
}

This means you can inject API keys through three methods — environment variables, configuration files, or the Dashboard UI — maximizing flexibility while maintaining security boundaries.

This layer is the absolute soul of BoxAgnts. The run_query_loop()

function implements the complete Agent reasoning loop, about 300 lines of code, yet handles an amazing number of edge cases.

loop {
    turn += 1;

    // 0. Check cancellation signal
    if cancel_token.is_cancelled() { return Cancelled; }

    // 1. Check max turns limit
    if turn > effective_max_turns { return EndTurn; }

    // 2. Inject pending user messages (multimodal interaction)
    if let Some(queue) = pending_messages.as_deref_mut() {
        for text in queue.drain(..) { /* append as user message */ }
    }

    // 3. Auto context compaction
    compact_state.maybe_compact(messages, config);

    // 4. Build API request
    let request = build_request(messages, tools, config);

    // 5. Send to AI model (supports streaming)
    let response = client.create_message_stream(request).await;

    // 6. Parse ContentBlocks from response
    for block in response.content {
        match block {
            ContentBlock::Text { text } => { /* accumulate text response */ }
            ContentBlock::ToolUse { name, input, .. } => {
                // Match and execute tool
                let tool = find_tool(name);
                let result = tool.execute(input, tool_ctx).await;
                messages.push(tool_result);  // Inject result into conversation
            }
            ContentBlock::Thinking { thinking, .. } => {
                // Handle deep thinking content (not shown to user)
            }
        }
    }

    // 7. If model ends → return final message
    if stop_reason == "end_turn" { return EndTurn; }
}

When the model runs out of token quota in a single response, the query loop does not simply return a truncated result. Instead, it automatically sends a carefully designed recovery message:

"Output token limit hit. Resume directly — no apology, no recap of what
 you were doing. Pick up mid-thought if that is where the cut happened.
 Break remaining work into smaller pieces."

This message is remarkably restrained in design: "no apology, no recap, pick up from the cut, break down tasks" — conveying maximum instruction with minimum tokens. Retries up to 3 times (MAX_TOKENS_RECOVERY_LIMIT = 3

) to avoid infinite loops.

compact.rs

implements an intelligent compression strategy. When conversation history approaches the model's context window limit, it summarizes early messages — preserving key information (file paths, error messages, important decisions) while discarding redundant intermediate steps. This strategy ensures that even extremely complex multi-turn tasks (such as refactoring an entire codebase) won't cause the Agent to "lose its memory" due to context overflow.

// query.rs — Auto switch to backup model on overload errors
if is_overloaded_error(&err) && fallback_model.is_some() && !used_fallback {
    effective_model = fallback_model;
    used_fallback = true;
    continue; // Retry with backup model
}

When the primary model (e.g., Claude Sonnet) returns an overload error during high-load periods, the system automatically switches to a backup model (e.g., Deepseek), ensuring tasks are not interrupted. This mechanism is completely transparent to the user.

pub enum QueryOutcome {
    BudgetExceeded { cost_usd: f64, limit_usd: f64 },
    // ...
}

After each turn, the query loop checks whether the accumulated cost exceeds the budget cap. Every API call is tracked via CostTracker

recording model and token consumption, ensuring costs are controllable. Budget overruns return clear error messages rather than silently overspending.

The ContentBlock

enum defines 14 content types, covering the full spectrum of interactions from plain text to deep thinking:

pub enum ContentBlock {
    Text { text: String },                          // Plain text
    Image { source: ImageSource },                  // Image
    ToolUse { id, name, input },                    // Tool call
    ToolResult { tool_use_id, content, is_error },  // Tool result
    Thinking { thinking, signature },               // Deep thinking
    Document { source, title, context },            // Document reference
    UserLocalCommandOutput { command, output },     // Shell command output
    UserCommand { name, args },                     // User command
    UserMemoryInput { key, value },                 // User memory
    SystemAPIError { message, retry_secs },         // API error
    CollapsedReadSearch { tool_name, paths },       // Collapsed search results
    TaskAssignment { id, subject, description },    // Sub-task assignment
    // ...
}

This fine-grained content typing allows the frontend to render each type with specialized treatment — error blocks show red borders, task assignment blocks show cyan borders, collapsed search results displayed as single-line summaries.

This is one of the most stunning middle-layer designs in BoxAgnts. managed_orchestrator.rs

implements a hierarchical Agent architecture:

                    User
                      │
                      ▼
         ┌───────────────────────┐
         │  Manager Agent        │  ← Uses strong model (e.g., Claude Opus)
         │  Analyze tasks → Break down → Assign │
         └───────┬───────────────┘
                 │
        ┌────────┼────────┐
        ▼        ▼        ▼
   ┌────────┐┌────────┐┌────────┐
   │Executor││Executor││Executor│  ← Uses economical model (e.g., Claude Sonnet/Deepseek)
   │Subtask1││Subtask2││Subtask3│
   └────┬───┘└────┬───┘└────┬───┘
        │         │         │
        └────────┼─────────┘
                 ▼
          Manager aggregates results
                 │
                 ▼
              Final Output
pub struct ManagedAgentConfig {
    pub enabled: bool,
    pub manager_model: String,           // Manager model (e.g., "claude-opus-4-6")
    pub executor_model: String,          // Executor model (e.g., "claude-sonnet-4-6")
    pub executor_max_turns: u32,         // Max turns per executor
    pub max_concurrent_executors: u32,   // Max parallel executors
    pub total_budget_usd: Option<f64>,   // Total budget cap
    pub executor_isolation: bool,        // Whether to isolate Git worktrees
}

The Manager Agent's system prompt precisely defines its role:

You are the MANAGER, the planning and reasoning layer.
You coordinate work but do NOT execute tasks using file/bash tools directly.
All implementation work is delegated to executor agents (via the Agent tool).
Each executor uses {executor_model}, with a maximum of {max_turns} turns.
You may run up to {max_concurrent} executors in parallel.

The Executor's prompt requires "complete self-containment" — executors cannot see the Manager's conversation history and must include all context in their prompt. This avoids context leakage and reduces token consumption.

This is the most critical interface definition in all of BoxAgnts. Every new tool only needs to implement this trait:

#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &'static str;
    fn description(&self) -> &'static str;
    fn input_schema(&self) -> Value;    // JSON Schema defining parameters
    async fn execute(&self, input: Value, ctx: &ToolContext) -> ToolResult;
}
pub struct ToolContext {
    pub cost_tracker: Arc<CostTracker>,         // Cost tracker
    pub session_id: Option<String>,             // Session ID
    pub current_turn: Arc<AtomicUsize>,         // Current turn
    pub non_interactive: bool,                  // Non-interactive mode
    pub config: Config,                         // Global configuration
    pub managed_agent_config: Option<ManagedAgentConfig>,
    pub allowed_outbound_hosts: Vec<String>,    // Outbound network whitelist
    pub block_url: Option<String>,              // Blocked URLs
}

ToolContext

is the tool's "passport" — carrying various contextual information such as permissions, sessions, costs, and networking. Every tool can access the required system state through it during execution.

// tools-manager/src/lib.rs
pub fn all_tools() -> Vec<Box<dyn Tool>> {
    vec![
        // Rust native tools
        Box::new(AskUserQuestionTool),
        Box::new(BriefTool),
        Box::new(EnterPlanModeTool),
        Box::new(ExitPlanModeTool),
        Box::new(SleepTool),
        Box::new(SkillTool),
        Box::new(ToolSearchTool),

        // WASM sandbox tools — same interface, different implementation
        Box::new(WasmTool::new("read",  "file-read-component.wasm",  ...)),
        Box::new(WasmTool::new("write", "file-write-component.wasm", ...)),
        Box::new(WasmTool::new("edit",  "file-edit-component.wasm",  ...)),
        Box::new(WasmTool::new("glob",  "file-glob-component.wasm",  ...)),
        Box::new(WasmTool::new("bash",  "bash-component.wasm",       ...)),
        Box::new(WasmTool::new("web_fetch", "web-fetch-component.wasm", ...)),
        Box::new(WasmTool::new("js_exec", "boxedjs-execute-component.wasm", ...)),
    ]
}

Notice that Rust native tools and WASM tools are placed in the same Vec<Box<dyn Tool>>

— to the AI model, they are completely equivalent. This is the power of interface-oriented programming.

cron/scheduler.rs

builds a complete scheduled task system based on tokio_cron_scheduler

:

// Core scheduling logic
let cron_job = Job::new_async(cron_expr, move |_uuid, _lock| {
    Box::pin(async move {
        let handle = job::execute(prompt, model).await;
        // Execution with timeout + result logging
        let result = timeout(Duration::from_secs(timeout), fut).await;
        append_execution_log(job_id, job_name, success, message).await;
    })
});

Key features:

tokio::time::timeout

CancellationToken

Site data managed by site/store.rs

is persisted via SQLite, supporting CRUD operations. Combined with the frontend SitesPage, users can:

/sites/{name}/

pathThe workspace module handles all persistence and configuration management:

Function	Storage	Key Implementation
Conversation History	SQLite (rusqlite)	Organized by session, supports CRUD
User Authentication	Password hash storage	Verified for remote access
Global Configuration	JSON file
`Settings::load()` to load
API Keys	Environment variables / JSON	Three-tier priority: ENV > Config > Default
AGENTS.md	Filesystem	Injected into system prompt each conversation
Cron Tasks	SQLite	Persisted storage + loaded at startup
Site Config	SQLite	Persisted storage + loaded at startup

Design highlight: configuration and state are separated. Configuration is JSON files (human-readable and editable), state is SQLite (efficient queries and transactions). This distinction avoids the common pitfall of "configuration file bloat."

QueryConfig

is a massive configuration struct with 20 fields, covering every dimension of an Agent query:

pub struct QueryConfig {
    pub model: String,                           // Model name
    pub max_tokens: u32,                         // Max output tokens
    pub max_turns: u32,                          // Max reasoning turns
    pub system_prompt: Option<String>,           // System prompt
    pub thinking_budget: Option<u32>,            // Thinking budget (deep reasoning)
    pub temperature: Option<f32>,                // Temperature parameter
    pub tool_result_budget: usize,               // Total char cap for tool results (50000)
    pub effort_level: Option<EffortLevel>,       // Effort level (affects thinking_budget)
    pub max_budget_usd: Option<f64>,             // USD budget cap
    pub fallback_model: Option<String>,          // Backup model
    pub agent_definition: Option<AgentDefinition>, // Agent definition
    pub managed_agents: Option<ManagedAgentConfig>, // Managed mode
    pub output_style: OutputStyle,               // Output style
    // ... and more
}

This struct demonstrates a core design philosophy of BoxAgnts: give control to the user, but provide reasonable defaults. Every field can be overridden, but none are required — defaults cover 90% of use cases.

The middle-layer Agent Toolbox is the capability core of BoxAgnts:

Module	Responsibility	Key Highlight
boxagnts-api
Multi-model unified access	LlmProvider trait, 20+ Providers, Transformer conversion
boxagnts-query
Agent reasoning loop	Token recovery, context compaction, Fallback switching, budget control
managed_orchestrator
Managed Agent architecture	Manager-Executor layering, parallel execution, budget management
boxagnts-tools
Unified tool abstraction	Tool trait, ToolContext
tools-manager
Central tool registry	Rust native + WASM unified as Vec>
boxagnts-gateway
Time and space extension	Cron scheduler, Site hosting
boxagnts-workspace
Memory system	SQLite + JSON dual-layer storage

source & further reading

dev.to — original article tropes.fyi: The Anti-Slop Cheat Sheet Your System Prompt Needs x402 standardized the handshake. It hasn't standardized the attacks. mcporter: The MCP Tax Collector (Use This Before Plugging Anything In)

BoxAgnts Introduction (2) — AI Agent Toolbox

Run your AI side-project on zahid.host