Building a Self-Healing AI Agent: How to Run Untrusted Code Safely Without Blowing Up Your Server The Hermes Agent framework (v0.13) implements a multi-layered defense system to safely execute untrusted code generated by autonomous AI agents, preventing catastrophic system failures like accidental file deletion. The architecture replaces static toolboxes with a hierarchical, policy-driven structure featuring tool definition, execution dispatch, and sandboxing layers that act as "control rods" to contain the AI's actions. This approach enables agents to run shell commands and Python scripts within a self-healing, sandboxed environment that prevents infinite loops and system-wide damage. Imagine you are building an autonomous AI agent. You give it a terminal tool, a file-writing tool, and the ability to execute Python scripts. You ask it to "clean up the temporary files in the project directory." The LLM processes the request, formulates a plan, and generates a terminal command. But due to a subtle parsing error or a hallucinated variable, it executes: rm -rf / temp In a fraction of a second, your host system is wiped out. This is the nightmare scenario for every developer working with agentic AI. As we transition from passive chatbots to active, autonomous agents that orchestrate tools, write code, and modify environments, we are handing over the keys to our digital kingdoms. How do we grant AI agents the power to execute code, run shell commands, and manage databases without risking catastrophic system failures or infinite, wallet-draining loops? The answer lies in moving away from static toolboxes and embracing a dynamic, self-healing, and sandboxed architecture. In this deep dive, we will explore how the Hermes Agent framework v0.13 solves this challenge using a multi-layered defense system, state-machine orchestration, and policy-based sandboxing. The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce https://tiny.cc/HermesAgent In traditional software development, a tool is a static library. It is a collection of documented, versioned functions invoked by a human developer. The developer is the sole orchestrator, the source of intent, and the error handler. In an autonomous agent architecture like Hermes, this model breaks down. The AI agent is the orchestrator. The tools are not just functions; they are the agent’s hands and eyes in the physical and digital world. Every tool call is a deliberate mutation of state—a file written, a command executed, a database queried. Therefore, we must treat tools as interfaces to an external state machine . The agent's core engine operates on a continuous loop of perception receiving user input and tool results , cognition the LLM call , and action executing tool calls . To prevent this loop from spinning out of control, we need the architectural equivalent of a nuclear reactor's control rods. The core reaction—the LLM generating tool calls—is incredibly powerful and inherently unpredictable. The toolsets and sandboxing layers act as control rods, absorbing excess reactivity to ensure the reaction remains self-sustaining but never explosive. To secure this state machine, Hermes abandons the flat "list of functions" approach used by simpler agent frameworks. Instead, it implements a hierarchical, versioned, and policy-driven architecture structured into three distinct layers: ┌────────────────────────────────────────────────────────┐ │ 1. Tool Definition Layer model tools.py │ │ - Schemas, descriptions, and JSON validation │ └───────────────────────────┬────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ 2. Tool Execution Layer handle function call │ │ - Dispatcher, sequential/concurrent execution │ └───────────────────────────┬────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ 3. Sandboxing Layer containment vessel │ │ - Guardrails, Checkpoints, Docker, Approvals │ └────────────────────────────────────────────────────────┘ model tools.py This serves as the agent's "catalog." It contains the schemas for every tool, defining its name, description, and the strict JSON schema for its arguments. This catalog is filtered based on enabled/disabled toolsets and sent to the LLM to inform it of its capabilities. handle function call This is the "dispatch center." When the LLM returns a tool calls payload, the agent’s loop parses the arguments and dispatches the call to the correct handler. This layer handles validation, type coercion, and initial error catching. This is the "containment vessel." It is not a single function, but a set of architectural patterns embedded in the execution of dangerous tools like terminal and execute code . It ensures that even if the agent’s intent is flawed or malicious, the impact on the host system is strictly controlled. run conversation Loop as a State Machine At the heart of the agent is the run conversation method. It is a classic state machine designed to realize a closed learning loop . The agent does not just call a tool and forget about it; it appends the tool's result back into the conversation history as a role: "tool" message. The result of its action becomes the context for its next thought. Here is a simplified look at how this loop operates within the execution engine: python def run conversation self, user message, ... : ... setup and memory loading ... while api call count < self.max iterations : 1. API CALL State: Send history to LLM response = self. interruptible api call api kwargs normalized = self. get transport .normalize response response assistant message = normalized 2. TOOL EXECUTION State: Process tool calls if present if assistant message.tool calls: Build the assistant message dict and append to history assistant msg = self. build assistant message assistant message, finish reason messages.append assistant msg Execute the tools sequential or concurrent self. execute tool calls assistant message, messages, effective task id Continue the loop, feeding the tool results back to the LLM continue else: 3. FINAL RESPONSE State: No more tools needed final response = assistant message.content break This feedback mechanism makes the agent incredibly capable, but it also introduces a vulnerability: the agent can be led into an infinite loop or a destructive cascade by its own mistakes. This is where policy-based permission control comes in. Traditional operating system security relies on identity-based control e.g., "Is this user root?" . Hermes, however, uses policy-based permission control . The agent does not have a static user identity; instead, every action is evaluated dynamically against a suite of safety policies before execution. Before any destructive tool call such as writing to a file or executing a risky terminal command occurs, the agent can trigger a filesystem checkpoint. If the tool execution fails or corrupts the environment, the system can roll back time to the last known good checkpoint. This provides a temporal sandbox that protects against permanent data loss. The ToolCallGuardrailController acts as a stateful observer. It monitors the pattern of tool calls across turns. If it detects that the agent is calling the same tool with the exact same arguments and receiving the same error repeatedly, the guardrail halts the execution. This acts as "emotional regulation" for the AI, forcing it to stop banging its head against a wall and alter its strategy. The terminal and execute code tools are the most powerful capabilities an agent can possess. They are also the most dangerous. Here is how Hermes tames them: Before passing a command to the shell, the terminal tool parses the command string against a set of regular expressions DESTRUCTIVE PATTERNS and REDIRECT OVERWRITE . If a pattern like rm -rf or raw block-device writes dd is detected, the agent is forced to create a filesystem checkpoint or halt for human approval. The agent can be configured to execute commands within isolated, persistent virtual environments or Docker containers. This ensures that any command run by the agent is physically isolated from the host operating system. The execute code tool is designed for quick, programmatic tasks like running a quick Python script to calculate a statistical distribution . Because these are cheap, RPC-style calls, Hermes introduces a brilliant optimization: the iteration budget refund . If the agent only executes programmatic code during a turn, the iteration budget is refunded: Refund the iteration if the ONLY tool called was execute code. These are cheap RPC-style calls that shouldn't eat the budget. tc names = {tc.function.name for tc in assistant message.tool calls} if tc names == {"execute code"}: self.iteration budget.refund This encourages the agent to use safe, programmatic execution for calculations and data transformations rather than spawning expensive, long-running terminal processes. Let’s look at how to implement a persistent, sandboxed agent using the real architectural patterns of the Hermes framework. This implementation combines the AIAgent with a persistent SessionDB to track conversation state, maintain memory, and enforce execution budgets across sessions. """ Basic Library Implementation: Persistent AI Agent with Tool Calling This example demonstrates how to set up a self-improving AI agent using the Hermes Agent framework. It shows: - Session database initialization - Agent creation with tool support - Conversation loop with tool execution - Memory and skills integration - Session persistence and retrieval """ import asyncio import json import logging import os import sys import time from pathlib import Path from typing import Dict, List, Optional, Any Import the core Hermes Agent classes from hermes state import SessionDB from run agent import AIAgent, IterationBudget Import tool definitions and helpers from model tools import get tool definitions, get toolset for tool, handle function call, check toolset requirements, Import memory and skills support from tools.memory tool import MemoryStore from tools.todo tool import TodoStore Import configuration helpers from hermes cli.config import load config, cfg get from hermes constants import get hermes home Configure logging logging.basicConfig level=logging.INFO, format='% asctime s - % name s - % levelname s - % message s' logger = logging.getLogger name class PersistentAgent: """ A self-improving AI agent with persistent memory and session tracking. This class wraps the Hermes AIAgent with session database integration, providing durable storage for conversations, token usage tracking, and support for the closed learning loop pattern. """ def init self, model: str = "anthropic/claude-sonnet-4-20250514", base url: Optional str = None, api key: Optional str = None, provider: Optional str = None, max iterations: int = 50, enabled toolsets: Optional List str = None, disabled toolsets: Optional List str = None, session db path: Optional Path = None, load soul identity: bool = True, skip context files: bool = False, verbose logging: bool = False, quiet mode: bool = True, : """ Initialize the persistent agent with database and AIAgent. """ Step 1: Initialize the session database for durable state tracking self.db path = session db path or get hermes home / "state.db" self.db path.parent.mkdir parents=True, exist ok=True self.session db = SessionDB db path=self.db path Step 2: Create the AIAgent instance with all configuration self.agent = AIAgent model=model, base url=base url or "", api key=api key, provider=provider, max iterations=max iterations, enabled toolsets=enabled toolsets or "web", "terminal", "memory" , disabled toolsets=disabled toolsets, save trajectories=False, We use SQLite instead for persistence verbose logging=verbose logging, quiet mode=quiet mode, load soul identity=load soul identity, skip context files=skip context files, session db=self.session db, Step 3: Initialize the in-memory todo store self.todo store = TodoStore Step 4: Set up memory store if memory tools are enabled self.memory store = None if "memory" in self.agent.valid tool names: try: config = load config mem config = config.get "memory", {} self.memory store = MemoryStore memory char limit=mem config.get "memory char limit", 2200 , user char limit=mem config.get "user char limit", 1375 , self.memory store.load from disk self.agent. memory store = self.memory store logger.info "Memory store successfully initialized from disk." except Exception as e: logger.warning f"Failed to initialize memory store: {e}" Step 5: Log initialization summary logger.info "PersistentAgent initialized: model=%s, tools=%d, db=%s", self.agent.model, len self.agent.tools or , self.db path async def execute turn self, user message: str, session id: str - str: """ Executes a single conversation turn, running tools as needed, while maintaining state persistence in the SQLite database. """ logger.info f"Starting turn for session {session id} with message: {user message}" Create an execution budget for this turn budget = IterationBudget max iterations=self.agent.max iterations Execute the conversation loop which handles LLM calls, tool execution, and guardrails response = await self.agent.run conversation user message=user message, iteration budget=budget, session id=session id Persist the updated memory state to disk if applicable if self.memory store: self.memory store.save to disk return response Example Usage async def main : Ensure API keys are set up in your environment before running if not os.environ.get "ANTHROPIC API KEY" and not os.environ.get "OPENAI API KEY" : print "Please set your ANTHROPIC API KEY or OPENAI API KEY environment variables." sys.exit 1 Initialize our persistent agent agent wrapper = PersistentAgent model="anthropic/claude-3-5-sonnet-latest", enabled toolsets= "memory", "terminal" session id = "demo-session-101" user prompt = "Find all files ending in .log in the current directory and summarize their count." Run the turn result = await agent wrapper.execute turn user prompt, session id=session id print "\n--- Agent Response ---" print result if name == " main ": asyncio.run main As the AI landscape matures, we are moving away from simple text generation and toward autonomous systems that can act on our behalf. But with great power comes great architectural responsibility. By shifting our design philosophy from "trust but verify" to "never trust, always isolate, checkpoint, and regulate," we can build agents that are both incredibly capable and completely safe. The three-tiered defense architecture, state-machine execution loop, temporal checkpointing, and stateful guardrails implemented in frameworks like Hermes provide the blueprint for the next generation of enterprise-grade AI software. We can finally give our agents the keys to the terminal—knowing that if they make a mistake, they can heal themselves without bringing down the house. Leave a comment below with your thoughts and experiences building autonomous agents The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce : details link https://tiny.cc/HermesAgent , you can find also my programming ebooks with AI here: Programming & AI eBooks http://tiny.cc/ProgrammingBooks .