AI Agent Memory Systems: How Hermes Remembers Across Sessions

wpnews.pro

Tired of your AI agent forgetting everything the moment a session ends? I spent three weeks debugging why my Hermes agent would lose context mid-task, only to discover the problem wasn’t the model. It was the memory layer sitting underneath it.

Here’s what I learned building memory systems that actually persist across sessions, and the one mistake that 90% of agent developers make without realizing it.

I’ve been running Hermes agents in production for six months now. The difference between an agent that remembers and one that doesn’t isn’t subtle. It’s the difference between a tool that saves you five hours a week and one that you have to re-explain everything to every single time.

Most AI agents today are amnesiacs. They wake up fresh each session with zero recollection of what happened before. You tell them your project structure on Monday, and by Tuesday they’re asking you the same questions again.

This isn’t a model limitation. Modern LLMs are perfectly capable of recalling information. The problem is architectural. The memory infrastructure connecting the model to its own history is either missing entirely or bolted on as an afterthought.

I learned this the hard way when I set up an agent to manage my content pipeline. First session: perfect. It understood my project structure, my preferences, my writing style. Second session: it was like talking to a stranger who had never seen my repo before. Three hours of setup work, gone.

The core issue: context windows are temporary, but memory needs to be permanent.

Sound familiar? What would you do differently?

Hermes uses a two-tier memory system. There’s session memory, which lives and dies with the current conversation, and then there’s persistent memory that gets injected into every new session automatically. This is the piece most people miss.

The persistent memory lives in a simple structure: key-value pairs stored as declarative facts. Not instructions to yourself, not procedural notes. Facts. “User prefers concise responses.” “Project uses pytest with xdist.” These are atomic, self-contained truths.

The beauty of this system is its simplicity. There’s no vector database to configure, no embedding model to fine-tune, no retrieval pipeline to debug. Just facts, stored plainly, injected reliably.

But here’s where it gets interesting. The system also supports a “memory” target for environmental facts and tool quirks. You can store notes about your infrastructure, your conventions, your gotchas. It’s like a sticky note system that actually works.

One thing I noticed early: the memory system has a character limit per entry. This sounds like a constraint, but it’s actually a feature. When you can only use 50 characters, you’re forced to distill facts to their essence. “User prefers concise responses in English with occasional code examples” becomes just “User prefers concise responses.” The detail lives in skills and session search, not memory.

Session memory is what most people think of when they hear “AI memory.” It’s the conversation history, the context window, the stuff the model can see right now. It’s powerful but ephemeral. Close the session and it evaporates.

Persistent memory survives across sessions. It’s injected as system context at the start of every new conversation. Think of it as the agent’s long-term memory, while session memory is its working memory.

I tracked this difference across 47 agent sessions over two months. Sessions with rich persistent memory were 3.2x more likely to complete tasks without asking for clarification. That’s not a marginal improvement. That’s the difference between a useful tool and a frustrating toy.

Rule of thumb: if you have to say something twice, it belongs in persistent memory.

The tricky part is deciding what gets persisted. Save too much and you bloat the context. Save too little and the agent keeps re-learning the basics. I found the sweet spot is around 15–20 high-signal facts per project.

Here’s a concrete example. I used to tell my agent which Python version I was using every session. “Use Python 3.13.” Then I stored it as a memory fact. Now the agent knows without asking, and it catches version-specific issues before they happen. When I upgraded to 3.13.5, I updated one line in memory instead of editing 14 task descriptions.

I’ve experimented with four different memory backends for agent systems. Here’s my honest assessment of each one, including the failures.

Plain text files. The simplest approach. Just write facts to a text file and read them back at session start. I used this for the first month. It works until you have 200 facts and the file takes up half your context window. Then it becomes noise.

SQLite with FTS5. This is what Hermes uses under the hood. Full-text search over a SQLite database means you can have thousands of facts and only retrieve the relevant ones. Much better than the plain file approach, and zero external dependencies.

-- The FTS5 approach that actually scalesCREATE VIRTUAL TABLE memory_fts USING fts5(    content,    fact_type,    project,    tokenize='porter unicode61');-- Search only what's relevantSELECT * FROM memory_ftsWHERE memory_fts MATCH 'deployment'ORDER BY rankLIMIT 5;

Vector databases. I tried ChromaDB for two weeks. The idea is elegant: embed all your memories, retrieve by semantic similarity. In practice, the embedding latency added 800ms to every session start, and the retrieval quality was worse than simple keyword match for factual information. For opinions and preferences, embeddings add no value over exact storage.

Hybrid approaches. I briefly experimented with combining vector search for conceptual memory and exact storage for factual memory. The complexity wasn’t worth it. Two systems to maintain, two failure modes, and the agent had to choose which to query. Keep it simple.

My recommendation: start with plain files, migrate to SQLite+FTS5 when you hit 50+ facts. Don’t over-engineer this on day one.

Here’s the pattern I see constantly. Developers store procedural instructions in memory. “Always run tests before deploying.” “Use pytest with xdist.” These aren’t memories. They’re procedures.

The problem is that procedural instructions compete with the agent’s actual instructions. The agent has to parse them, prioritize them, and decide when they apply. Half the time it gets it wrong.

Instead, store declarative facts. Not “use pytest” but “this project uses pytest.” Not “respond concisely” but “user prefers concise responses.” Let the agent figure out the procedure from the facts.

I made this mistake for weeks. My memory file was full of instructions like “Check the cron config before editing tasks” and “Always verify API responses.” The agent would follow some, ignore others, and occasionally get confused about which instructions were still current.

When I switched to declarative facts, the agent’s behavior became more consistent. It could reason about when each fact applied rather than blindly following a list of commands.

Declarative facts let the agent think. Procedural instructions tell it what to think.

Here’s something nobody tells you about agent memory: it rots. Facts that were true six months ago are now wrong. Projects that used pytest switched to uv. Team members left. Conventions changed.

I set up a monthly review where I scan my persistent memory and remove anything stale. Takes about ten minutes. Without this, your agent starts operating on outdated assumptions and you can’t figure out why it’s making weird choices.

The Hermes system actually helps with this. Memory entries are compact and high-signal by design. You’re not storing paragraphs. You’re storing atomic facts. This makes maintenance trivially easy compared to, say, pruning a vector database.

Set a calendar reminder: review persistent memory on the first of every month.

I also tag entries by age. If a fact hasn’t been referenced in 30 days, it’s a candidate for removal. The agent should only carry what it actually uses.

Last month I found 8 stale facts in my memory. One said “Project X uses Docker” even though we’d migrated to bare-metal six months earlier. The agent was still suggesting Docker commands for that project. Embarrassing, but a ten-minute cleanup fixed it entirely.

After six months of running Hermes with persistent memory, here’s what my setup looks like: 23 facts about my projects, 11 facts about my preferences, 8 facts about my infrastructure. That’s it. 42 facts total.

These 42 facts save me roughly 15 minutes per session. Across 4–5 sessions per day, that’s about an hour saved daily. Over six months, that’s roughly 90 hours of not re-explaining myself.

The compounding effect is real. Each session starts with the agent already knowing my environment, my preferences, my conventions. It can get straight to work instead of spending the first five messages re-establishing context.

I also use memory for cross-project learning. When I solve a problem in one project, the solution gets stored as a fact. When a similar problem comes up in another project, the agent already knows the pattern. This is where memory gets genuinely powerful, beyond just “remembering your name.”

For instance, I stored a fact about how my Vikunja instance handles partial updates on task descriptions. Three weeks later, I was working with a different project that also used the API. The agent already knew the quirk and worked around it without me having to re-explain the whole situation.

Storing task progress in memory. “Phase 2 done, Phase 3 in progress” is not a memory. It’s a log. Use session search or a task tracker for this. Memory is for stable facts, not transient state.

Over-persisting. If you store everything, you dilute the important facts. I once had 80 facts and the agent started ignoring half of them. Pruning to the top 25 made everything work better.

Storing environment-specific details. API keys, file paths, server IPs change constantly. These belong in configuration, not memory. Memory should hold facts that are stable for weeks or months.

Forgetting that memory is injected as context. Every fact you store takes up space in the context window. Make each fact count. “User is a developer” is obvious from context. “User prefers tabs over spaces in Python” is worth storing.

Treating memory as a search engine. Memory isn’t Google. It doesn’t find answers to questions. It provides context for decisions. If you need retrieval, use session search or a proper database. Memory is for “what the agent already knows,” not “what the agent can look up.”

Agent memory isn’t complicated. It’s just hard to get right because the stakes are subtle. Store too little and the agent is amnesiac. Store too much and you drown the signal in noise. Store procedures and the agent gets confused. Store facts and it gets smarter.

Start with five facts about your most common project. Run your agent for a week. Notice what questions it asks repeatedly. Turn those into memory entries. Repeat.

The goal isn’t perfect memory. It’s good enough memory that the agent stops wasting your time with questions it should already know the answer to.

What’s the one fact you wish your agent remembered without you having to repeat it? I’m genuinely curious what pain points are most universal.

AI Agent Memory Systems: How Hermes Remembers Across Sessions was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

source & further reading

pub.towardsai.net — original article Microsoft Copilot Cowork and the Rise of the AI-Native Work Claude Fable 5, Explained: Why Anthropic Ships its Most Powerful Model in Two Versions If You Use AI in 2026, You Should Understand These 17 Concepts

AI Agent Memory Systems: How Hermes Remembers Across Sessions

Run your AI side-project on zahid.host