Modern AI agents are rapidly gaining operational authority—executing shell commands, modifying repositories, accessing local files, operating cloud infrastructure, managing developer environments.
The problem is that most AI infrastructure still relies on a security model designed for trusted human operators. That assumption no longer holds.
LLMs are not trustworthy execution authorities. They are probabilistic systems exposed to prompt injection, adversarial context, untrusted documents, manipulated tool outputs, and reasoning instability. Yet many AI agents still run with privileges equivalent to root.
This isn't a tooling problem—it's a security architecture problem.
BoxAgnts' query loop clearly demonstrates how LLMs become runtime controllers—the model decides which tool to call, what arguments to pass, what resources to access. In boxagnts/query/src/query.rs
:
// Each turn, the model's generated content is parsed.
// If it contains tool_use blocks, the system executes the corresponding tools.
for tool_use_block in tool_uses {
let tool_name = &tool_use_block.name;
let tool = find_tool(&tools, tool_name);
let result = tool.execute(tool_input, tool_ctx).await;
// Result is fed back to the model as a ToolResult message
}
The key issue is that runtimes typically grant the model overly broad implicit authority—unrestricted filesystem, unrestricted network, unrestricted shell. An LLM doesn't understand operational risk, privilege escalation, production safety, or organizational boundaries—it only predicts plausible continuations.
Malicious instructions can be embedded in webpages, Markdown files, source code, emails, PDFs, and API responses—the model cannot reliably distinguish "trusted instruction" from "malicious instruction" through prompts alone.
So the core question isn't "Can the model behave safely sometimes?"—it's that unrestricted permissions amplify every reasoning failure. The goal isn't to make the model trustworthy; it's to make unsafe behavior containable. This requires capability boundaries.
BoxAgnts' Agent tool design (boxagnts/tools/src/agent/mod.rs
) embodies this principle. An agent can be configured with tools
restrictions—only a specific tool set; can set max_turns
hard caps; can choose isolation: "worktree"
to run in an isolated Git worktree. These are all instances of capability constraints:
#[derive(Debug, Deserialize)]
struct AgentInput {
description: String,
prompt: String,
tools: Option<Vec<String>>, // Limit sub-agent's available tools
max_turns: Option<u32>, // Hard turn cap
isolation: Option<String>, // Isolation mode (worktree)
model: Option<String>, // Model restriction
run_in_background: bool, // Async isolation
}
RBAC, ACL, IAM—these identity-based security models assume stable identities, predictable workflows, and human operators. AI agents violate all three—dynamically generating workflows, probabilistically invoking tools, coordinating across multiple agents.
BoxAgnts' PermissionMode
configuration offers a more flexible approach:
pub enum PermissionMode {
BypassPermissions, // Skip permission checks (not recommended for production)
Default, // Standard permission checks
AcceptEdits, // Auto-accept edit operations
Plan, // Planning mode (read-only)
}
But even this model isn't granular enough. What's really needed is a precise description like "Agent can read /workspace/project, write /workspace/tmp, cannot access ~/.ssh, cannot access production secrets."
The core idea of capability security is simple: don't give the agent the root password—give it a precise permission list.
BoxAgnts' WASM execution model is the engineering implementation of this idea. In RunOption
, every capability is explicitly declared:
work_dir → Filesystem capability: only expose specified directories
allowed_outbound_hosts → Network capability: allowlist-style outbound connections
env_vars → Environment capability: selectively pass environment variables
wasm_timeout → Time capability: time-limited execution
wasm_max_memory_size → Memory capability: hard memory ceiling
wasm_fuel → Compute capability: instruction count limit
The network-level capability control is especially fine-grained. Look at boxagnts/wasm-sandbox/src/extension/net.rs
:
// Outbound connection check
pub async fn socket_addr_check(
addr: SocketAddr,
addr_use: SocketAddrUse,
allowed_outbound_hosts: OutboundAllowedHosts,
blocked_networks: BlockedNetworks,
) -> bool {
// TCP bind? Denied
// UDP bind? Denied
// Outbound connection? Check allowlist and blocklist
}
The model cannot override these constraints—no matter how the LLM "reasons" in prompts, the WASM sandbox's TCP bind always returns false. This is the core advantage of capability security: safety doesn't depend on model intent; it depends on runtime enforcement.
Traditional operating systems evolved around trusted human users—humans have contextual understanding, organizational awareness, long-term reasoning, and accountability. LLMs have none of these. They cannot consistently evaluate whether a file is sensitive, whether a command is dangerous, or whether an API call violates policy.
That's why capability security fits AI better than RBAC: it cuts dependency on model judgment. It's not about expecting the agent to make correct decisions—it's about ensuring the runtime constrains possible decisions. Security should not depend on model alignment; it should depend on runtime guarantees.
BoxAgnts' ToolContext
contains all the elements of this design awareness:
pub struct ToolContext {
pub permission_mode: PermissionMode,
pub session_id: Option<String>,
pub current_turn: Arc<AtomicUsize>,
pub non_interactive: bool,
pub mcp_manager: Option<Arc<boxagnts_mcp::McpManager>>,
pub config: Config,
pub allowed_outbound_hosts: Vec<String>,
pub block_url: Option<String>,
}
Every tool execution carries this context. Note that allowed_outbound_hosts
and block_url
aren't suggestions—they are hard constraints passed to the WASM runtime.
In BoxAgnts' Managed Agent mode, the Manager distributes tasks to multiple Executors. Each Executor can have different capability sets, different models, different tool access.
In boxagnts/query/src/managed_orchestrator.rs
, the system prompt explicitly defines this layering:
You are the MANAGER, responsible for the planning and reasoning layer.
You cannot directly use file/bash tools—you must delegate to executor agents.
Each executor uses model {executor_model}, with at most {max_turns} turns.
At most {max_concurrent} executors run in parallel.
This layering itself is capability security—the Manager's capability is "delegation"; the Executor's capability is "execution." The Manager won't accidentally execute dangerous shell commands because it simply doesn't have shell tools.
As AI system complexity grows, capabilities themselves may become orchestratable resources. Future runtimes may manage capability delegation, temporary permissions, capability revocation, execution tracing, capability inheritance, and resource accounting.
BoxAgnts' current architecture already leaves room for this extension: ToolContext
, as the context carrier for every tool execution, can naturally expand into a "capability context"—carrying not just the current agent's permission set, but also inheritance chains, delegation relationships, and audit logs.
AI agents are evolving from conversational systems into execution systems. This shift fundamentally changes security requirements. LLMs are inherently exposed to adversarial instructions, untrusted context, and probabilistic execution paths—as long as they run with broad implicit permissions, they remain structurally unsafe.
The solution isn't better prompts—it's runtime-enforced capability isolation. BoxAgnts' practice demonstrates that capability-driven runtimes provide constrained execution, explicit permissions, deterministic boundaries, and governable infrastructure.
AI agents should receive capabilities, not root access.