{"slug": "building-a-self-healing-ai-agent-how-to-run-untrusted-code-safely-without-up", "title": "Building a Self-Healing AI Agent: How to Run Untrusted Code Safely Without Blowing Up Your Server", "summary": "The Hermes Agent framework (v0.13) implements a multi-layered defense system to safely execute untrusted code generated by autonomous AI agents, preventing catastrophic system failures like accidental file deletion. The architecture replaces static toolboxes with a hierarchical, policy-driven structure featuring tool definition, execution dispatch, and sandboxing layers that act as \"control rods\" to contain the AI's actions. This approach enables agents to run shell commands and Python scripts within a self-healing, sandboxed environment that prevents infinite loops and system-wide damage.", "body_md": "Imagine you are building an autonomous AI agent. You give it a terminal tool, a file-writing tool, and the ability to execute Python scripts. You ask it to \"clean up the temporary files in the project directory.\"\n\nThe LLM processes the request, formulates a plan, and generates a terminal command. But due to a subtle parsing error or a hallucinated variable, it executes:\n\n```\nrm -rf / temp\n```\n\nIn a fraction of a second, your host system is wiped out.\n\nThis is the nightmare scenario for every developer working with agentic AI. As we transition from passive chatbots to active, autonomous agents that orchestrate tools, write code, and modify environments, we are handing over the keys to our digital kingdoms.\n\nHow do we grant AI agents the power to execute code, run shell commands, and manage databases without risking catastrophic system failures or infinite, wallet-draining loops?\n\nThe answer lies in moving away from static toolboxes and embracing a dynamic, self-healing, and sandboxed architecture. In this deep dive, we will explore how the Hermes Agent framework (v0.13) solves this challenge using a multi-layered defense system, state-machine orchestration, and policy-based sandboxing.\n\n(The concepts and code demonstrated here are drawn from my ebook [Hermes Agent, The Self-Evolving AI Workforce](https://tiny.cc/HermesAgent))\n\nIn traditional software development, a tool is a static library. It is a collection of documented, versioned functions invoked by a human developer. The developer is the sole orchestrator, the source of intent, and the error handler.\n\nIn an autonomous agent architecture like Hermes, this model breaks down. The AI agent is the orchestrator. The tools are not just functions; they are the agent’s hands and eyes in the physical and digital world.\n\nEvery tool call is a deliberate mutation of state—a file written, a command executed, a database queried. Therefore, we must treat tools as **interfaces to an external state machine**.\n\nThe agent's core engine operates on a continuous loop of **perception** (receiving user input and tool results), **cognition** (the LLM call), and **action** (executing tool calls).\n\nTo prevent this loop from spinning out of control, we need the architectural equivalent of a nuclear reactor's control rods. The core reaction—the LLM generating tool calls—is incredibly powerful and inherently unpredictable. The toolsets and sandboxing layers act as control rods, absorbing excess reactivity to ensure the reaction remains self-sustaining but never explosive.\n\nTo secure this state machine, Hermes abandons the flat \"list of functions\" approach used by simpler agent frameworks. Instead, it implements a hierarchical, versioned, and policy-driven architecture structured into three distinct layers:\n\n```\n┌────────────────────────────────────────────────────────┐\n│ 1. Tool Definition Layer (model_tools.py)              │\n│    - Schemas, descriptions, and JSON validation        │\n└───────────────────────────┬────────────────────────────┘\n                            │\n                            ▼\n┌────────────────────────────────────────────────────────┐\n│ 2. Tool Execution Layer (handle_function_call)         │\n│    - Dispatcher, sequential/concurrent execution       │\n└───────────────────────────┬────────────────────────────┘\n                            │\n                            ▼\n┌────────────────────────────────────────────────────────┐\n│ 3. Sandboxing Layer (containment vessel)               │\n│    - Guardrails, Checkpoints, Docker, Approvals        │\n└────────────────────────────────────────────────────────┘\n```\n\n`model_tools.py`\n\n)\nThis serves as the agent's \"catalog.\" It contains the schemas for every tool, defining its name, description, and the strict JSON schema for its arguments. This catalog is filtered based on enabled/disabled toolsets and sent to the LLM to inform it of its capabilities.\n\n`handle_function_call`\n\n)\nThis is the \"dispatch center.\" When the LLM returns a `tool_calls`\n\npayload, the agent’s loop parses the arguments and dispatches the call to the correct handler. This layer handles validation, type coercion, and initial error catching.\n\nThis is the \"containment vessel.\" It is not a single function, but a set of architectural patterns embedded in the execution of dangerous tools (like `terminal`\n\nand `execute_code`\n\n). It ensures that even if the agent’s intent is flawed or malicious, the impact on the host system is strictly controlled.\n\n`run_conversation`\n\nLoop as a State Machine\nAt the heart of the agent is the `run_conversation`\n\nmethod. It is a classic state machine designed to realize a **closed learning loop**. The agent does not just call a tool and forget about it; it appends the tool's result back into the conversation history as a `role: \"tool\"`\n\nmessage. The result of its action becomes the context for its next thought.\n\nHere is a simplified look at how this loop operates within the execution engine:\n\n``` python\ndef run_conversation(self, user_message, ...):\n    # ... setup and memory loading ...\n    while (api_call_count < self.max_iterations):\n        # 1. API_CALL State: Send history to LLM\n        response = self._interruptible_api_call(api_kwargs)\n        normalized = self._get_transport().normalize_response(response)\n        assistant_message = normalized\n\n        # 2. TOOL_EXECUTION State: Process tool calls if present\n        if assistant_message.tool_calls:\n            # Build the assistant message dict and append to history\n            assistant_msg = self._build_assistant_message(assistant_message, finish_reason)\n            messages.append(assistant_msg)\n\n            # Execute the tools (sequential or concurrent)\n            self._execute_tool_calls(assistant_message, messages, effective_task_id)\n\n            # Continue the loop, feeding the tool results back to the LLM\n            continue\n        else:\n            # 3. FINAL_RESPONSE State: No more tools needed\n            final_response = assistant_message.content\n            break\n```\n\nThis feedback mechanism makes the agent incredibly capable, but it also introduces a vulnerability: **the agent can be led into an infinite loop or a destructive cascade by its own mistakes.** This is where policy-based permission control comes in.\n\nTraditional operating system security relies on identity-based control (e.g., \"Is this user root?\"). Hermes, however, uses **policy-based permission control**. The agent does not have a static user identity; instead, every action is evaluated dynamically against a suite of safety policies before execution.\n\nBefore any destructive tool call (such as writing to a file or executing a risky terminal command) occurs, the agent can trigger a filesystem checkpoint. If the tool execution fails or corrupts the environment, the system can roll back time to the last known good checkpoint. This provides a temporal sandbox that protects against permanent data loss.\n\nThe `ToolCallGuardrailController`\n\nacts as a stateful observer. It monitors the pattern of tool calls across turns. If it detects that the agent is calling the same tool with the exact same arguments and receiving the same error repeatedly, the guardrail halts the execution. This acts as \"emotional regulation\" for the AI, forcing it to stop banging its head against a wall and alter its strategy.\n\nThe `terminal`\n\nand `execute_code`\n\ntools are the most powerful capabilities an agent can possess. They are also the most dangerous. Here is how Hermes tames them:\n\nBefore passing a command to the shell, the terminal tool parses the command string against a set of regular expressions (`_DESTRUCTIVE_PATTERNS`\n\nand `_REDIRECT_OVERWRITE`\n\n). If a pattern like `rm -rf`\n\nor raw block-device writes (`dd`\n\n) is detected, the agent is forced to create a filesystem checkpoint or halt for human approval.\n\nThe agent can be configured to execute commands within isolated, persistent virtual environments or Docker containers. This ensures that any command run by the agent is physically isolated from the host operating system.\n\nThe `execute_code`\n\ntool is designed for quick, programmatic tasks (like running a quick Python script to calculate a statistical distribution). Because these are cheap, RPC-style calls, Hermes introduces a brilliant optimization: **the iteration budget refund**.\n\nIf the agent only executes programmatic code during a turn, the iteration budget is refunded:\n\n```\n# Refund the iteration if the ONLY tool called was execute_code.\n# These are cheap RPC-style calls that shouldn't eat the budget.\n_tc_names = {tc.function.name for tc in assistant_message.tool_calls}\nif _tc_names == {\"execute_code\"}:\n    self.iteration_budget.refund()\n```\n\nThis encourages the agent to use safe, programmatic execution for calculations and data transformations rather than spawning expensive, long-running terminal processes.\n\nLet’s look at how to implement a persistent, sandboxed agent using the real architectural patterns of the Hermes framework.\n\nThis implementation combines the `AIAgent`\n\nwith a persistent `SessionDB`\n\nto track conversation state, maintain memory, and enforce execution budgets across sessions.\n\n```\n\"\"\"\nBasic Library Implementation: Persistent AI Agent with Tool Calling\n\nThis example demonstrates how to set up a self-improving AI agent using\nthe Hermes Agent framework. It shows:\n- Session database initialization\n- Agent creation with tool support\n- Conversation loop with tool execution\n- Memory and skills integration\n- Session persistence and retrieval\n\"\"\"\n\nimport asyncio\nimport json\nimport logging\nimport os\nimport sys\nimport time\nfrom pathlib import Path\nfrom typing import Dict, List, Optional, Any\n\n# Import the core Hermes Agent classes\nfrom hermes_state import SessionDB\nfrom run_agent import AIAgent, IterationBudget\n\n# Import tool definitions and helpers\nfrom model_tools import (\n    get_tool_definitions,\n    get_toolset_for_tool,\n    handle_function_call,\n    check_toolset_requirements,\n)\n\n# Import memory and skills support\nfrom tools.memory_tool import MemoryStore\nfrom tools.todo_tool import TodoStore\n\n# Import configuration helpers\nfrom hermes_cli.config import load_config, cfg_get\nfrom hermes_constants import get_hermes_home\n\n# Configure logging\nlogging.basicConfig(\n    level=logging.INFO,\n    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'\n)\nlogger = logging.getLogger(__name__)\n\nclass PersistentAgent:\n    \"\"\"\n    A self-improving AI agent with persistent memory and session tracking.\n\n    This class wraps the Hermes AIAgent with session database integration,\n    providing durable storage for conversations, token usage tracking,\n    and support for the closed learning loop pattern.\n    \"\"\"\n\n    def __init__(\n        self,\n        model: str = \"anthropic/claude-sonnet-4-20250514\",\n        base_url: Optional[str] = None,\n        api_key: Optional[str] = None,\n        provider: Optional[str] = None,\n        max_iterations: int = 50,\n        enabled_toolsets: Optional[List[str]] = None,\n        disabled_toolsets: Optional[List[str]] = None,\n        session_db_path: Optional[Path] = None,\n        load_soul_identity: bool = True,\n        skip_context_files: bool = False,\n        verbose_logging: bool = False,\n        quiet_mode: bool = True,\n    ):\n        \"\"\"\n        Initialize the persistent agent with database and AIAgent.\n        \"\"\"\n        # Step 1: Initialize the session database for durable state tracking\n        self.db_path = session_db_path or (get_hermes_home() / \"state.db\")\n        self.db_path.parent.mkdir(parents=True, exist_ok=True)\n        self.session_db = SessionDB(db_path=self.db_path)\n\n        # Step 2: Create the AIAgent instance with all configuration\n        self.agent = AIAgent(\n            model=model,\n            base_url=base_url or \"\",\n            api_key=api_key,\n            provider=provider,\n            max_iterations=max_iterations,\n            enabled_toolsets=enabled_toolsets or [\"web\", \"terminal\", \"memory\"],\n            disabled_toolsets=disabled_toolsets,\n            save_trajectories=False,  # We use SQLite instead for persistence\n            verbose_logging=verbose_logging,\n            quiet_mode=quiet_mode,\n            load_soul_identity=load_soul_identity,\n            skip_context_files=skip_context_files,\n            session_db=self.session_db,\n        )\n\n        # Step 3: Initialize the in-memory todo store\n        self.todo_store = TodoStore()\n\n        # Step 4: Set up memory store if memory tools are enabled\n        self.memory_store = None\n        if \"memory\" in self.agent.valid_tool_names:\n            try:\n                config = load_config()\n                mem_config = config.get(\"memory\", {})\n                self.memory_store = MemoryStore(\n                    memory_char_limit=mem_config.get(\"memory_char_limit\", 2200),\n                    user_char_limit=mem_config.get(\"user_char_limit\", 1375),\n                )\n                self.memory_store.load_from_disk()\n                self.agent._memory_store = self.memory_store\n                logger.info(\"Memory store successfully initialized from disk.\")\n            except Exception as e:\n                logger.warning(f\"Failed to initialize memory store: {e}\")\n\n        # Step 5: Log initialization summary\n        logger.info(\n            \"PersistentAgent initialized: model=%s, tools=%d, db=%s\",\n            self.agent.model,\n            len(self.agent.tools or []),\n            self.db_path\n        )\n\n    async def execute_turn(self, user_message: str, session_id: str) -> str:\n        \"\"\"\n        Executes a single conversation turn, running tools as needed,\n        while maintaining state persistence in the SQLite database.\n        \"\"\"\n        logger.info(f\"Starting turn for session {session_id} with message: {user_message}\")\n\n        # Create an execution budget for this turn\n        budget = IterationBudget(max_iterations=self.agent.max_iterations)\n\n        # Execute the conversation loop (which handles LLM calls, tool execution, and guardrails)\n        response = await self.agent.run_conversation(\n            user_message=user_message,\n            iteration_budget=budget,\n            session_id=session_id\n        )\n\n        # Persist the updated memory state to disk if applicable\n        if self.memory_store:\n            self.memory_store.save_to_disk()\n\n        return response\n\n# Example Usage\nasync def main():\n    # Ensure API keys are set up in your environment before running\n    if not os.environ.get(\"ANTHROPIC_API_KEY\") and not os.environ.get(\"OPENAI_API_KEY\"):\n        print(\"Please set your ANTHROPIC_API_KEY or OPENAI_API_KEY environment variables.\")\n        sys.exit(1)\n\n    # Initialize our persistent agent\n    agent_wrapper = PersistentAgent(\n        model=\"anthropic/claude-3-5-sonnet-latest\",\n        enabled_toolsets=[\"memory\", \"terminal\"]\n    )\n\n    session_id = \"demo-session-101\"\n    user_prompt = \"Find all files ending in .log in the current directory and summarize their count.\"\n\n    # Run the turn\n    result = await agent_wrapper.execute_turn(user_prompt, session_id=session_id)\n    print(\"\\n--- Agent Response ---\")\n    print(result)\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\nAs the AI landscape matures, we are moving away from simple text generation and toward autonomous systems that can act on our behalf. But with great power comes great architectural responsibility.\n\nBy shifting our design philosophy from \"trust but verify\" to **\"never trust, always isolate, checkpoint, and regulate,\"** we can build agents that are both incredibly capable and completely safe.\n\nThe three-tiered defense architecture, state-machine execution loop, temporal checkpointing, and stateful guardrails implemented in frameworks like Hermes provide the blueprint for the next generation of enterprise-grade AI software. We can finally give our agents the keys to the terminal—knowing that if they make a mistake, they can heal themselves without bringing down the house.\n\n*Leave a comment below with your thoughts and experiences building autonomous agents!*\n\nThe concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook **Hermes Agent, The Self-Evolving AI Workforce**: [details link](https://tiny.cc/HermesAgent), you can find also my programming ebooks with AI here: [Programming & AI eBooks](http://tiny.cc/ProgrammingBooks).", "url": "https://wpnews.pro/news/building-a-self-healing-ai-agent-how-to-run-untrusted-code-safely-without-up", "canonical_source": "https://dev.to/programmingcentral/building-a-self-healing-ai-agent-how-to-run-untrusted-code-safely-without-blowing-up-your-server-4859", "published_at": "2026-05-29 20:00:00+00:00", "updated_at": "2026-05-29 20:11:06.121155+00:00", "lang": "en", "topics": ["ai-agents", "ai-safety", "ai-tools", "ai-infrastructure", "large-language-models"], "entities": ["Hermes Agent"], "alternates": {"html": "https://wpnews.pro/news/building-a-self-healing-ai-agent-how-to-run-untrusted-code-safely-without-up", "markdown": "https://wpnews.pro/news/building-a-self-healing-ai-agent-how-to-run-untrusted-code-safely-without-up.md", "text": "https://wpnews.pro/news/building-a-self-healing-ai-agent-how-to-run-untrusted-code-safely-without-up.txt", "jsonld": "https://wpnews.pro/news/building-a-self-healing-ai-agent-how-to-run-untrusted-code-safely-without-up.jsonld"}}