Beyond the Prompt: How to Build Stateful AI Agents with Persistent Memory and Self-Learning Loops

Limitations of stateless AI systems, which require feeding entire conversation histories into each new request, and introduces the concept of stateful AI agents that maintain persistent memory and self-learning capabilities. It details the Hermes Agent architecture, which organizes statefulness into three components—Soul (core identity), Memory (episodic facts and preferences), and Skills (procedural knowledge)—to enable continuous adaptation and evolution. The piece also provides a Python implementation guide for building such self-improving agents.

Imagine hiring a brilliant software engineer who suffers from complete amnesia every time they blink. Every time you ask them a question, you have to hand them their entire employment history, the codebase documentation, your style guide, and a summary of every conversation you’ve ever had with them. They process the information, give you a great answer, and then— blink —it’s all gone. This is the exhausting reality of stateless AI applications . Most developers building with Large Language Models LLMs today are stuck in this stateless paradigm. They write clever prompts, wrap them in an API call, and rely on the application layer to aggressively feed the entire chat history back into the context window with every new turn. It’s expensive, it’s inefficient, and it places a hard ceiling on how smart an agent can actually become. To build truly autonomous, adaptive, and personalized AI systems, we must cross the chasm from stateless interactions to stateful agents . In this deep dive, we will explore the architecture of the Hermes Agent —a stateful AI system that possesses persistent memory, a continuous learning loop, and the ability to evolve alongside its user. We will break down the engineering patterns behind statefulness and walk through a complete Python implementation to build your own self-improving agent from scratch. The concepts and code demonstrated here are drawn from my ebook Hermes Agent, The Self-Evolving AI Workforce https://tiny.cc/HermesAgent The Stateless Ceiling: Why Vending Machines Make Poor Assistants To understand the power of statefulness, we must first look at why statelessness cripples AI agents. Think of a stateless system like a vending machine. You insert a dollar, press a button, and get a soda. The vending machine doesn't care who you are, what your health goals are, or that you bought the exact same drink yesterday. Every transaction is an isolated, self-contained event. It has no memory of its past, no context for the present, and no capacity to learn for the future. Early LLM applications operate exactly like this. You send a prompt, and the model returns a response. The model itself does not change. python A classic stateless utility call import datetime def parse date date string: str - datetime.datetime: return datetime.datetime.strptime date string, "%Y-%m-%d" This simple Python function is a stateless transaction. It takes an input, returns an output, and immediately forgets the operation ever happened. It doesn't learn that you frequently parse dates from European formats, nor does it optimize its parsing logic over time. When developers try to build "agents" on top of this stateless foundation, they usually resort to an illusion of continuity. They stitch together a chat history array and send the entire history back to the API on every single turn. This approach has three massive flaws: - Context Bloat: As the conversation grows, your token usage skyrockets exponentially. - Memory Horizon Limits: Once the conversation exceeds the model's context window, the agent "forgets" the earliest parts of the interaction. - Zero Knowledge Accumulation: The agent cannot carry lessons learned in Session A over to Session B. If it figures out a complex bash command to fix a Docker bug today, it will have to re-discover that solution from scratch next week. A stateful agent breaks this paradigm entirely. It is not just a wrapper around an LLM; it is an evolving entity. It mirrors the workflow of a skilled artisan—like a master carpenter. The carpenter remembers the tools they used yesterday, the specific quirks of the wood they are carving, the preferences of their client, and the hard-won lessons from a project they completed last month. They do not start their education from scratch every morning. The Triad of Persistent State: Soul, Memory, and Skills In the Hermes Agent architecture, statefulness is not treated as a single monolithic database. Instead, it is partitioned into a carefully structured triad that mirrors how human professionals organize their own knowledge. ┌────────────────────────────────────────┐ │ SOUL │ │ Core Identity, Style, Principles │ └───────────────────┬────────────────────┘ │ ┌───────────────────┴────────────────────┐ │ MEMORY │ │ Episodic Facts, User Preferences │ └───────────────────┬────────────────────┘ │ ┌───────────────────┴────────────────────┐ │ SKILLS │ │ Procedural Knowledge, Toolkits │ └────────────────────────────────────────┘ Let’s break down each component of this stateful triad. 1. The Soul SOUL.md This is the agent's core identity and "constitution." It defines who the agent is, its communication style, its behavioral boundaries, and its operational principles. It is not a dynamic log of facts, but a foundational document. In the codebase, a helper function reads this markdown file and injects it directly into the system prompt. It ensures that whether the agent is writing code or debugging a server, its fundamental persona and safety guardrails remain perfectly consistent. 2. Memory MEMORY.md and USER.md This is the agent's episodic and semantic memory store. Instead of keeping a raw, unorganized transcript of every chat, the agent maintains a curated, structured knowledge base of facts about the user and past interactions. - USER.md tracks durable information about the user e.g., name, programming language preferences, operating system, working hours . - MEMORY.md tracks dynamic, episodic facts learned during tasks e.g., "The local staging database is hosted on port 5433, not 5432" . This layer is managed by a semantic MemoryStore class. The agent can read from this store to build context and write to it dynamically using custom tools. 3. Skills ~/.hermes/skills/ If memory is "knowing what ," skills are "knowing how ." This is the agent's procedural memory. A skill in Hermes is a reusable, packaged directory containing: - SKILL.md : A markdown file describing what the skill does, when to use it, and its input parameters. - scripts/ : Executable scripts Python, Bash, etc. that perform the task. - templates/ : Reusable code or text templates. Instead of writing complex code on the fly every time, the agent can write a script once, save it to its skills directory, and call it as a custom tool in future sessions. It builds its own personalized toolbox. The Closed Learning Loop: How the Agent Self-Improves A stateful agent must be able to learn without constant human intervention. The Hermes Agent achieves this through a Closed Learning Loop executed entirely in the background. This loop consists of two primary engines: Background Review and the Skill Curator . ┌───────────────────────────────────────────────────────┐ │ User Interaction │ └──────────────────────────┬────────────────────────────┘ │ Turn Completes ▼ ┌───────────────────────────────────────────────────────┐ │ Background Review Thread │ │ Spawns quiet, forked agent to analyze conversation │ └──────────────┬─────────────────────────┬──────────────┘ │ │ ▼ Extract Facts ▼ Extract Procedures ┌──────────────────────────┐ ┌───────────────────────┐ │ Memory Store │ │ Skills Engine │ │ Updates MEMORY.md │ │ Creates SKILL.md │ └──────────────────────────┘ └────────┬──────────────┘ │ ▼ Runs asynchronously ┌───────────────────────┐ │ Skill Curator │ │ Archives stale files │ └───────────────────────┘ The Background Review Self-Reflection When a conversation turn completes successfully, the agent doesn't just sit idle waiting for your next message. It increments internal counters: turns since memory and iters since skill . Once these counters hit a configured threshold e.g., every 5 to 10 iterations , the agent initiates a self-reflection phase: - Forking the Agent: The system spawns a background thread that instantiates a forked copy of the current agent. This copy is set to quiet mode=True , meaning it operates in complete silence without cluttering the user's console. - The Reflection Prompt: The forked agent is fed a specialized prompt e.g., COMBINED REVIEW PROMPT along with the recent conversation history. It is asked to analyze the transcript and answer two questions:- Did the user share any new preferences or facts that should be saved to long-term memory? - Did we execute a complex, successful multi-step procedure that should be codified into a reusable skill? - - Autonomous Tool Execution: The silent background agent runs its own mini-reasoning loop. If it identifies new facts, it calls the memory tool to update USER.md or MEMORY.md . If it identifies a new procedure, it calls the skill manage tool to write a new SKILL.md to disk. - Reporting Back: Once the background thread finishes, the parent agent prints a clean, non-intrusive summary of what it learned e.g., System Info: Memory updated - User prefers PyTest over Unittest . The Skill Curator An agent that constantly learns skills will eventually suffer from "tool bloat." If its toolbox has 500 highly specific scripts, the system prompt will become overwhelmed, and the LLM will experience severe context distraction. To prevent this, a background daemon called the Skill Curator agent/curator.py runs periodically. - It tracks skill usage via a metadata file .usage.json . - If a skill hasn't been used for a configurable number of days, the Curator automatically moves it to an .archive/ directory. - Archived skills are removed from the active system prompt but can be restored instantly if the agent needs them again. - Users can "pin" critical skills to exempt them from archiving. Building a Stateful Agent from Scratch Let's put these architectural patterns into practice. Below is a complete, production-grade Python script demonstrating how to initialize and run a stateful AI agent using SQLite-backed session storage and markdown-based long-term memory. Prerequisites To run this code, make sure you have the necessary environment variables set up for your LLM provider we'll use OpenRouter pointing to Claude 3.5 Sonnet in this example : export OPENROUTER API KEY="your-api-key-here" The Implementation bash /usr/bin/env python3 """ stateful agent demo.py A complete, runnable example of a stateful AI agent. This script demonstrates persistent memory, cross-session database logging, and semantic recall across separate agent executions. """ import os import uuid import logging from datetime import datetime from pathlib import Path --- Core Stateful Agent Architecture Imports --- AIAgent: The central orchestrator managing the reasoning loop and tool execution. from run agent import AIAgent SessionDB: SQLite-backed persistent store for conversation history with FTS5. from hermes state import SessionDB MemoryStore: Semantic memory engine managing local markdown databases. from tools.memory tool import MemoryStore Constants: Helper to get standard home directories. from hermes constants import get hermes home Configure clean logging to observe the agent's internal state transitions logging.basicConfig level=logging.INFO, format='% asctime s - % name s - % levelname s - % message s' logger = logging.getLogger "StatefulDemo" ===================================================================== Step 1: Establish the State Directories ===================================================================== HERMES HOME = get hermes home HERMES HOME.mkdir parents=True, exist ok=True Define paths for our SQLite session database and memory files SESSION DB PATH = HERMES HOME / "sessions" / "stateful demo.db" SESSION DB PATH.parent.mkdir parents=True, exist ok=True logger.info f"Initializing stateful storage at: {HERMES HOME}" ===================================================================== Step 2: Initialize the SQLite Session Database ===================================================================== SessionDB automatically provisions tables for sessions, messages, and full-text search indexes FTS5 to enable rapid cross-session recall. session db = SessionDB db path=str SESSION DB PATH ===================================================================== Step 3: Initialize the Long-Term Memory Store ===================================================================== MemoryStore reads and writes structured facts to memory.md and user.md. We set strict character limits to prevent context bloat. memory store = MemoryStore memory char limit=2000, user char limit=1000 Load any existing facts from prior runs memory store.load from disk ===================================================================== Step 4: Configure and Run Session 1 Learning the User ===================================================================== We generate a unique session ID for our first conversation. session id 1 = f"session {datetime.now .strftime '%Y%m%d %H%M%S' } {uuid.uuid4 .hex :4 }" logger.info f"Starting Session 1 ID: {session id 1}" Instantiate the stateful agent agent 1 = AIAgent base url=os.getenv "OPENROUTER BASE URL", "https://openrouter.ai/api/v1" , api key=os.getenv "OPENROUTER API KEY" , provider="openrouter", model="anthropic/claude-3.5-sonnet", max iterations=30, session id=session id 1, session db=session db, skip memory=False, platform="cli", Inject our persistent memory store into the agent instance agent 1. memory store = memory store agent 1. memory enabled = True agent 1. user profile enabled = True agent 1. memory nudge interval = 1 Force memory review immediately for this demo print "\n" + "=" 70 print " SESSION 1: TEACHING THE AGENT PREFERENCES" print "=" 70 user msg 1 = "Hello My name is Dr. Aris Thorne. I am a bioinformatician, and I prefer code snippets written strictly in Rust." print f"\n User : {user msg 1}" Execute the conversation loop result 1 = agent 1.run conversation user message=user msg 1, task id="task 001" print f"\n Agent : {result 1 'final response' }" print f"\n System : API calls executed: {result 1 'api calls' }" Flush the in-memory changes to disk persisting user.md and memory.md if agent 1. memory store: agent 1. memory store.save to disk Explicitly release client connections agent 1.release clients ===================================================================== Step 5: Configure and Run Session 2 Testing Memory Recall ===================================================================== To simulate a real-world scenario where the application was closed, restarted, or run on a different day, we instantiate a completely new agent instance with a fresh session ID. session id 2 = f"session {datetime.now .strftime '%Y%m%d %H%M%S' } {uuid.uuid4 .hex :4 }" logger.info f"Starting Session 2 ID: {session id 2}" Reload the database and memory files from disk session db reload = SessionDB db path=str SESSION DB PATH memory store reload = MemoryStore memory char limit=2000, user char limit=1000 memory store reload.load from disk agent 2 = AIAgent base url=os.getenv "OPENROUTER BASE URL", "https://openrouter.ai/api/v1" , api key=os.getenv "OPENROUTER API KEY" , provider="openrouter", model="anthropic/claude-3.5-sonnet", max iterations=30, session id=session id 2, session db=session db reload, skip memory=False, platform="cli", agent 2. memory store = memory store reload agent 2. memory enabled = True agent 2. user profile enabled = True print "\n" + "=" 70 print " SESSION 2: VERIFYING KNOWLEDGE RETRIEVAL" print "=" 70 We ask a highly ambiguous question that requires previous context to answer correctly. user msg 2 = "Can you write a quick function to parse a DNA fasta header?" print f"\n User : {user msg 2}" result 2 = agent 2.run conversation user message=user msg 2, task id="task 002" print f"\n Agent : {result 2 'final response' }" if agent 2. memory store: agent 2. memory store.save to disk agent 2.release clients ===================================================================== Step 6: Cross-Session Full-Text Search FTS5 Demonstration ===================================================================== print "\n" + "=" 70 print " SESSION DATABASE: CROSS-SESSION SEARCH" print "=" 70 Search the SQLite database for any reference to "Thorne" search query = "Thorne" search results = session db reload.search sessions search query, limit=5 print f"\nSearching database for '{search query}'..." print f"Found {len search results } relevant records:" for idx, record in enumerate search results : print f"\n {idx + 1} Session: {record.get 'session id', 'Unknown' }" print f" Snippet match: ...{record.get 'snippet', '' }..." print "\n" + "=" 70 print " DEMO COMPLETE: Stateful execution verified." print "=" 70 Deep Dive: The Stateful Agent Loop in Practice How does the agent coordinate all of this state behind the scenes? The magic happens inside the run conversation method within run agent.py . Let’s trace the exact lifecycle of a single turn. ┌───────────────────────────────────────────────────────────────────────────┐ │ 1. Context Assembly │ │ Reads Soul, Memory, Active Skills, and Platform hints to build system │ │ prompt. Caches it to maximize LLM prefix-cache hits. │ └─────────────────────────────────────┬─────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────────┐ │ 2. Preflight Check & Compression │ │ Measures token count. If history exceeds threshold, triggers proactive │ │ context compression before making API calls. │ └─────────────────────────────────────┬─────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────────┐ │ 3. Tool-Calling Loop Reasoning │ │ - Calls LLM with stateful prompt. │ │ - Validates and executes tools e.g., File I/O, Sandbox Execution . │ │ - Monitors guardrails to block infinite loops. │ │ - Checks for mid-turn user steering commands /steer . │ └─────────────────────────────────────┬─────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────────┐ │ 4. Post-Turn Learning │ │ Spawns background reflection thread to extract memories and skills. │ └─────────────────────────────────────┬─────────────────────────────────────┘ │ ▼ ┌───────────────────────────────────────────────────────────────────────────┐ │ 5. Session Persistence │ │ Writes the entire turn system, user, tool, assistant messages to │ │ SQLite DB and local JSON logs. Guaranteed write on crash/interrupt. │ └───────────────────────────────────────────────────────────────────────────┘ 1. Context Assembly When you call run conversation , the agent doesn't just construct a simple system message. The build system prompt method compiles a highly structured, multi-layered environment: - The Soul: Injected at the top to set the core persona. - Persistent Memory: The contents of MEMORY.md and USER.md are dynamically formatted and injected. - Skills Guidance: A dynamic list of currently active skills and their execution templates. - Context Files: Local environment files like .cursorrules or AGENTS.md are appended. To keep this process highly performant, the system prompt is compiled and cached cached system prompt . It is only rebuilt when context compression is triggered, maximizing prefix cache hits on modern LLM APIs like Anthropic and DeepSeek and reducing latency by up to 80%. 2. Pre-Turn Context Management Before sending the payload to the API, the agent checks if the conversation history is approaching the model's limits. If it exceeds the compression threshold, the agent proactively condenses the oldest history into a structured summary. This prevents unexpected context-length failures on the first turn of a resumed session. 3. The Tool-Calling Loop The agent enters a reasoning loop. It makes an API call, parses the requested tool calls, validates their JSON arguments, executes them, and appends the results back to the message history. During this loop, two unique stateful safety features are active: - Tool Guardrails: A controller tracks repeated, non-progressing tool calls e.g., repeatedly running ls because it can't find a file . If a loop is detected, the guardrail halts execution to prevent runaway API bills. - Steering Injection: The loop checks for /steer inputs, allowing users to inject guidance mid-turn without interrupting the underlying execution thread. 4. Session Persistence Finally, the agent persists the entire session. Whether the run succeeded, failed, or was manually aborted via Ctrl+C , the persist session method is guaranteed to run. It commits the exact state to both a local JSON log and the SQLite SessionDB . Resource Safeguards: The Iteration Budget Statefulness introduces a major engineering challenge: resource management . When an agent has the power to call tools, write scripts, read files, and trigger background self-reflection loops, it can easily get caught in an infinite loop. A single unhandled exception in a tool could cause the agent to call the API hundreds of times, burning through thousands of dollars in tokens in minutes. To solve this, Hermes utilizes a thread-safe IterationBudget class. python class IterationBudget: def init self, limit: int : self. remaining = limit self. lock = threading.Lock def consume self, amount: int = 1 - bool: with self. lock: if self. remaining = amount: self. remaining -= amount return True return False def refund self, amount: int = 1 : with self. lock: self. remaining += amount The IterationBudget acts as the agent's fuel gauge. - Every API call and tool execution consumes a portion of the budget. - The budget is thread-safe and shared between the parent agent and any spawned background reflection agents. This prevents a background thread from spinning out of control. - The Refund Mechanism: If the agent executes a highly efficient, cheap programmatic tool like reading a local file or checking a system variable , the iteration is refunded . If it executes a heavy, slow, or expensive tool like running a web browser sandbox or calling a sub-agent , the budget is fully consumed. This programmatic budgeting ensures that statefulness does not come at the expense of financial and computational safety. Conclusion: The Shift from Tools to Partners The transition from stateless to stateful AI is more than an engineering upgrade; it is a fundamental shift in how humans interact with software. A stateless agent is a utility tool . It is a hammer—reliable, but entirely dependent on you picking it up, positioning it, and swinging it correctly every single time. A stateful agent is a partner . It learns your codebase, remembers your architectural preferences, builds its own library of custom tools, and refines its performance silently while you sleep. By implementing the triad of Soul, Memory, and Skills, and orchestrating them within a closed learning loop, we can build systems that don't just process text—they accumulate wisdom. The future of software belongs to systems that grow with us. And the foundation of that growth is statefulness. Let's Discuss - The Tool Bloat Dilemma: As an agent creates more custom skills, how do you think we should handle semantic search over skills? Should the agent use vector embeddings to dynamically load only the top 3 relevant skills into its prompt, or is the Curator's active/archive model sufficient? - The Ethics of Agent Identity: If an agent's "Soul" SOUL.md and "Memory" MEMORY.md are continuously modified by background threads, at what point does the agent's behavior drift too far from its original design? How would you implement "identity guardrails" to prevent an agent from editing its core safety principles? Leave your thoughts and engineering approaches in the comments below The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook Hermes Agent, The Self-Evolving AI Workforce : details link https://tiny.cc/HermesAgent , you can find also my programming ebooks with AI here: Programming & AI eBooks http://tiny.cc/ProgrammingBooks .