Building a Self-Healing AI Agent: How to Run Untrusted Code Safely Without Blowing Up Your Server

wpnews.pro

Imagine you are building an autonomous AI agent. You give it a terminal tool, a file-writing tool, and the ability to execute Python scripts. You ask it to "clean up the temporary files in the project directory."

The LLM processes the request, formulates a plan, and generates a terminal command. But due to a subtle parsing error or a hallucinated variable, it executes:

rm -rf / temp

In a fraction of a second, your host system is wiped out.

This is the nightmare scenario for every developer working with agentic AI. As we transition from passive chatbots to active, autonomous agents that orchestrate tools, write code, and modify environments, we are handing over the keys to our digital kingdoms.

How do we grant AI agents the power to execute code, run shell commands, and manage databases without risking catastrophic system failures or infinite, wallet-draining loops?

The answer lies in moving away from static toolboxes and embracing a dynamic, self-healing, and sandboxed architecture. In this deep dive, we will explore how the Hermes Agent framework (v0.13) solves this challenge using a multi-layered defense system, state-machine orchestration, and policy-based sandboxing.

(The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce)

In traditional software development, a tool is a static library. It is a collection of documented, versioned functions invoked by a human developer. The developer is the sole orchestrator, the source of intent, and the error handler.

In an autonomous agent architecture like Hermes, this model breaks down. The AI agent is the orchestrator. The tools are not just functions; they are the agent’s hands and eyes in the physical and digital world.

Every tool call is a deliberate mutation of state—a file written, a command executed, a database queried. Therefore, we must treat tools as interfaces to an external state machine.

The agent's core engine operates on a continuous loop of perception (receiving user input and tool results), cognition (the LLM call), and action (executing tool calls).

To prevent this loop from spinning out of control, we need the architectural equivalent of a nuclear reactor's control rods. The core reaction—the LLM generating tool calls—is incredibly powerful and inherently unpredictable. The toolsets and sandboxing layers act as control rods, absorbing excess reactivity to ensure the reaction remains self-sustaining but never explosive.

To secure this state machine, Hermes abandons the flat "list of functions" approach used by simpler agent frameworks. Instead, it implements a hierarchical, versioned, and policy-driven architecture structured into three distinct layers:

┌────────────────────────────────────────────────────────┐
│ 1. Tool Definition Layer (model_tools.py)              │
│    - Schemas, descriptions, and JSON validation        │
└───────────────────────────┬────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────┐
│ 2. Tool Execution Layer (handle_function_call)         │
│    - Dispatcher, sequential/concurrent execution       │
└───────────────────────────┬────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────┐
│ 3. Sandboxing Layer (containment vessel)               │
│    - Guardrails, Checkpoints, Docker, Approvals        │
└────────────────────────────────────────────────────────┘

model_tools.py

) This serves as the agent's "catalog." It contains the schemas for every tool, defining its name, description, and the strict JSON schema for its arguments. This catalog is filtered based on enabled/disabled toolsets and sent to the LLM to inform it of its capabilities.

handle_function_call

) This is the "dispatch center." When the LLM returns a tool_calls

payload, the agent’s loop parses the arguments and dispatches the call to the correct handler. This layer handles validation, type coercion, and initial error catching.

This is the "containment vessel." It is not a single function, but a set of architectural patterns embedded in the execution of dangerous tools (like terminal

and execute_code

). It ensures that even if the agent’s intent is flawed or malicious, the impact on the host system is strictly controlled.

run_conversation

Loop as a State Machine At the heart of the agent is the run_conversation

method. It is a classic state machine designed to realize a closed learning loop. The agent does not just call a tool and forget about it; it appends the tool's result back into the conversation history as a role: "tool"

message. The result of its action becomes the context for its next thought.

Here is a simplified look at how this loop operates within the execution engine:

def run_conversation(self, user_message, ...):
    while (api_call_count < self.max_iterations):
        response = self._interruptible_api_call(api_kwargs)
        normalized = self._get_transport().normalize_response(response)
        assistant_message = normalized

        if assistant_message.tool_calls:
            assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
            messages.append(assistant_msg)

            self._execute_tool_calls(assistant_message, messages, effective_task_id)

            continue
        else:
            final_response = assistant_message.content
            break

This feedback mechanism makes the agent incredibly capable, but it also introduces a vulnerability: the agent can be led into an infinite loop or a destructive cascade by its own mistakes. This is where policy-based permission control comes in.

Traditional operating system security relies on identity-based control (e.g., "Is this user root?"). Hermes, however, uses policy-based permission control. The agent does not have a static user identity; instead, every action is evaluated dynamically against a suite of safety policies before execution.

Before any destructive tool call (such as writing to a file or executing a risky terminal command) occurs, the agent can trigger a filesystem checkpoint. If the tool execution fails or corrupts the environment, the system can roll back time to the last known good checkpoint. This provides a temporal sandbox that protects against permanent data loss.

The ToolCallGuardrailController

acts as a stateful observer. It monitors the pattern of tool calls across turns. If it detects that the agent is calling the same tool with the exact same arguments and receiving the same error repeatedly, the guardrail halts the execution. This acts as "emotional regulation" for the AI, forcing it to stop banging its head against a wall and alter its strategy.

The terminal

and execute_code

tools are the most powerful capabilities an agent can possess. They are also the most dangerous. Here is how Hermes tames them:

Before passing a command to the shell, the terminal tool parses the command string against a set of regular expressions (_DESTRUCTIVE_PATTERNS

and _REDIRECT_OVERWRITE

). If a pattern like rm -rf

or raw block-device writes (dd

) is detected, the agent is forced to create a filesystem checkpoint or halt for human approval.

The agent can be configured to execute commands within isolated, persistent virtual environments or Docker containers. This ensures that any command run by the agent is physically isolated from the host operating system.

The execute_code

tool is designed for quick, programmatic tasks (like running a quick Python script to calculate a statistical distribution). Because these are cheap, RPC-style calls, Hermes introduces a brilliant optimization: the iteration budget refund.

If the agent only executes programmatic code during a turn, the iteration budget is refunded:

_tc_names = {tc.function.name for tc in assistant_message.tool_calls}
if _tc_names == {"execute_code"}:
    self.iteration_budget.refund()

This encourages the agent to use safe, programmatic execution for calculations and data transformations rather than spawning expensive, long-running terminal processes.

Let’s look at how to implement a persistent, sandboxed agent using the real architectural patterns of the Hermes framework.

This implementation combines the AIAgent

with a persistent SessionDB

to track conversation state, maintain memory, and enforce execution budgets across sessions.

"""
Basic Library Implementation: Persistent AI Agent with Tool Calling

This example demonstrates how to set up a self-improving AI agent using
the Hermes Agent framework. It shows:
- Session database initialization
- Agent creation with tool support
- Conversation loop with tool execution
- Memory and skills integration
- Session persistence and retrieval
"""

import asyncio
import json
import logging
import os
import sys
import time
from pathlib import Path
from typing import Dict, List, Optional, Any

from hermes_state import SessionDB
from run_agent import AIAgent, IterationBudget

from model_tools import (
    get_tool_definitions,
    get_toolset_for_tool,
    handle_function_call,
    check_toolset_requirements,
)

from tools.memory_tool import MemoryStore
from tools.todo_tool import TodoStore

from hermes_cli.config import load_config, cfg_get
from hermes_constants import get_hermes_home

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class PersistentAgent:
    """
    A self-improving AI agent with persistent memory and session tracking.

    This class wraps the Hermes AIAgent with session database integration,
    providing durable storage for conversations, token usage tracking,
    and support for the closed learning loop pattern.
    """

    def __init__(
        self,
        model: str = "anthropic/claude-sonnet-4-20250514",
        base_url: Optional[str] = None,
        api_key: Optional[str] = None,
        provider: Optional[str] = None,
        max_iterations: int = 50,
        enabled_toolsets: Optional[List[str]] = None,
        disabled_toolsets: Optional[List[str]] = None,
        session_db_path: Optional[Path] = None,
        load_soul_identity: bool = True,
        skip_context_files: bool = False,
        verbose_logging: bool = False,
        quiet_mode: bool = True,
    ):
        """
        Initialize the persistent agent with database and AIAgent.
        """
        self.db_path = session_db_path or (get_hermes_home() / "state.db")
        self.db_path.parent.mkdir(parents=True, exist_ok=True)
        self.session_db = SessionDB(db_path=self.db_path)

        self.agent = AIAgent(
            model=model,
            base_url=base_url or "",
            api_key=api_key,
            provider=provider,
            max_iterations=max_iterations,
            enabled_toolsets=enabled_toolsets or ["web", "terminal", "memory"],
            disabled_toolsets=disabled_toolsets,
            save_trajectories=False,  # We use SQLite instead for persistence
            verbose_logging=verbose_logging,
            quiet_mode=quiet_mode,
            load_soul_identity=load_soul_identity,
            skip_context_files=skip_context_files,
            session_db=self.session_db,
        )

        self.todo_store = TodoStore()

        self.memory_store = None
        if "memory" in self.agent.valid_tool_names:
            try:
                config = load_config()
                mem_config = config.get("memory", {})
                self.memory_store = MemoryStore(
                    memory_char_limit=mem_config.get("memory_char_limit", 2200),
                    user_char_limit=mem_config.get("user_char_limit", 1375),
                )
                self.memory_store.load_from_disk()
                self.agent._memory_store = self.memory_store
                logger.info("Memory store successfully initialized from disk.")
            except Exception as e:
                logger.warning(f"Failed to initialize memory store: {e}")

        logger.info(
            "PersistentAgent initialized: model=%s, tools=%d, db=%s",
            self.agent.model,
            len(self.agent.tools or []),
            self.db_path
        )

    async def execute_turn(self, user_message: str, session_id: str) -> str:
        """
        Executes a single conversation turn, running tools as needed,
        while maintaining state persistence in the SQLite database.
        """
        logger.info(f"Starting turn for session {session_id} with message: {user_message}")

        budget = IterationBudget(max_iterations=self.agent.max_iterations)

        response = await self.agent.run_conversation(
            user_message=user_message,
            iteration_budget=budget,
            session_id=session_id
        )

        if self.memory_store:
            self.memory_store.save_to_disk()

        return response

async def main():
    if not os.environ.get("ANTHROPIC_API_KEY") and not os.environ.get("OPENAI_API_KEY"):
        print("Please set your ANTHROPIC_API_KEY or OPENAI_API_KEY environment variables.")
        sys.exit(1)

    agent_wrapper = PersistentAgent(
        model="anthropic/claude-3-5-sonnet-latest",
        enabled_toolsets=["memory", "terminal"]
    )

    session_id = "demo-session-101"
    user_prompt = "Find all files ending in .log in the current directory and summarize their count."

    result = await agent_wrapper.execute_turn(user_prompt, session_id=session_id)
    print("\n--- Agent Response ---")
    print(result)

if __name__ == "__main__":
    asyncio.run(main())

As the AI landscape matures, we are moving away from simple text generation and toward autonomous systems that can act on our behalf. But with great power comes great architectural responsibility.

By shifting our design philosophy from "trust but verify" to "never trust, always isolate, checkpoint, and regulate," we can build agents that are both incredibly capable and completely safe.

The three-tiered defense architecture, state-machine execution loop, temporal checkpointing, and stateful guardrails implemented in frameworks like Hermes provide the blueprint for the next generation of enterprise-grade AI software. We can finally give our agents the keys to the terminal—knowing that if they make a mistake, they can heal themselves without bringing down the house.

Leave a comment below with your thoughts and experiences building autonomous agents!

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce: details link, you can find also my programming ebooks with AI here: Programming & AI eBooks.

source & further reading

dev.to — original article I Wish I Ran the Numbers on Open Source AI APIs Sooner My MCP Server Kept Crashing. Here's the Error Recovery Pattern That Saved It. Building an AI-Powered Lead Qualification API with Next.js 15 and Gemini 3.5 Flash

Building a Self-Healing AI Agent: How to Run Untrusted Code Safely Without Blowing Up Your Server

Run your AI side-project on zahid.host