103. Agent Memory: Short-Term, Long-Term, and Episodic

An engineer built a four-tier memory architecture for AI agents, implementing working, semantic, episodic, and procedural memory systems from scratch. The system enables agents to retain context across sessions, recall past conversations and user preferences, and build persistent knowledge rather than starting fresh with each interaction. The implementation uses vector stores for semantic memory, structured databases for episodic memory, and in-context windows for working memory.

Main Thumbnail Image Prompt: A human brain cross-section illustration in neon tones on dark background. Three regions clearly demarcated and labeled. The hippocampus region glows blue, labeled "Episodic Memory: what happened." The prefrontal cortex glows orange, labeled "Working Memory: what I'm doing now." A network of distributed nodes glows green, labeled "Semantic Memory: what I know." Arrows show information flowing between regions. Scientific but accessible, the memory architecture made neural and visual. Memory Architecture Diagram Image Prompt: Four storage boxes arranged vertically on dark background. Top: "In-Context Window Working Memory " — fastest, smallest, temporary, shown as RAM chip icon. Second: "External Vector Store Semantic Memory " — fast retrieval, persistent, shown as cylinder with search icon. Third: "Key-Value Store Episodic Memory " — structured facts, shown as database icon. Bottom: "Fine-Tuned Weights Procedural Memory " — slowest to update, most permanent, shown as brain with lock. Arrows showing read/write speeds between boxes. Clean, technical, the hierarchy is the insight. Memory Retrieval Flow Image Prompt: A query arrives at an agent on the left. Four parallel arrows go right to four memory sources: conversation history short chat bubbles , vector database semantic search visualization , structured database table icon , model weights brain icon . Each source returns relevant items. A "Memory Fusion" box on the right combines the results. The agent sees an enriched context. The retrieval from multiple stores is the architecture. Every conversation with an LLM starts from zero. You explain your project. You explain your preferences. You explain your constraints. You spend five minutes providing context. You come back tomorrow. You do it all again. The model remembers nothing between sessions. The context window closes. The state is gone. Every interaction is the agent's first day on the job. Human productivity depends on memory. We remember what worked last time. We build on past experience. We know our tools, our colleagues, our recurring problems. We do not start from scratch daily. Agents with memory do this. They remember past conversations. They recall relevant facts. They store successful strategies. They build up a model of the user's preferences and project context over time. This post builds all four types of agent memory from scratch. python import os import json import time import hashlib from typing import List, Dict, Optional, Any, Tuple from dataclasses import dataclass, field from datetime import datetime from pathlib import Path import anthropic import numpy as np print "The Four Types of Agent Memory:" print memory types = { "Working Memory In-Context ": { "speed": "Instant", "capacity": "Limited by context window ~200K tokens ", "persistence": "Session only — gone when conversation ends", "best for": "Current conversation, active task state", "implementation": "messages list in API call", }, "Semantic Memory Vector Store ": { "speed": "Fast milliseconds ", "capacity": "Millions of embeddings", "persistence": "Persistent across sessions", "best for": "Knowledge base, past conversations, documents", "implementation": "ChromaDB, Pinecone, FAISS", }, "Episodic Memory Structured Store ": { "speed": "Fast key-value lookup ", "capacity": "Unlimited", "persistence": "Persistent across sessions", "best for": "User preferences, facts, past actions, outcomes", "implementation": "SQLite, Redis, JSON files", }, "Procedural Memory Weights ": { "speed": "Instant baked in ", "capacity": "Model-dependent", "persistence": "Requires fine-tuning to update", "best for": "Skills, domain knowledge, behavioral patterns", "implementation": "Fine-tuning, LoRA adapters", }, } for name, info in memory types.items : print f" {name}:" for key, val in info.items : print f" {key:<18}: {val}" print class WorkingMemory: """ Short-term memory that lives in the context window. Automatically manages the sliding window to stay within token limits. """ def init self, max turns: int = 20, max tokens: int = 50000 : self.turns: List Dict = self.max turns = max turns self.max tokens = max tokens self. token count = 0 def add self, role: str, content: str : self.turns.append { "role": role, "content": content, "timestamp": datetime.utcnow .isoformat , "tokens": len content.split 1.3 rough estimate } self. trim if needed def trim if needed self : while len self.turns self.max turns 2: self.turns.pop 0 def get messages self - List Dict : return {"role": t "role" , "content": t "content" } for t in self.turns def get recent self, n turns: int = 5 - List Dict : recent = self.turns - n turns 2 : return {"role": t "role" , "content": t "content" } for t in recent def summarize old self, keep last: int = 5 - str: """Compress old turns into a summary to free context space.""" if len self.turns <= keep last 2: return "" old turns = self.turns :- keep last 2 summary parts = for turn in old turns: if turn "role" == "user": summary parts.append f"User asked about: {turn 'content' :50 }" return "Previous conversation summary: " + "; ".join summary parts def clear self : self.turns = def len self : return len self.turns // 2 wm = WorkingMemory max turns=10 wm.add "user", "My name is Rahul and I am building a recommendation system." wm.add "assistant", "Great What type of recommendations? User-based or item-based?" wm.add "user", "User-based collaborative filtering for an e-commerce platform." wm.add "assistant", "For user-based CF, you will need a user-item interaction matrix..." print "Working Memory Demo:" print f" Current turns: {len wm }" print f" Messages in context: {len wm.get messages }" print print " Recent context:" for msg in wm.get messages : print f" {msg 'role' :<10} : {msg 'content' :60 }..." python from sentence transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine similarity class SemanticMemory: """ Long-term memory stored as embeddings. Retrieves relevant past context by semantic similarity. Think of this as the agent's 'searchable journal.' """ def init self, embed model: str = "all-MiniLM-L6-v2", persist path: str = "./agent memory" : self.embedder = SentenceTransformer embed model self.persist path = Path persist path self.persist path.mkdir exist ok=True self. entries: List Dict = self. embeddings: Optional np.ndarray = None self. load def remember self, content: str, memory type: str = "conversation", metadata: Dict = None : """Store a memory with embedding.""" entry = { "id": hashlib.md5 content.encode .hexdigest :8 , "content": content, "type": memory type, "timestamp": datetime.utcnow .isoformat , "metadata": metadata or {} } self. entries.append entry new emb = self.embedder.encode content self. embeddings = new emb if self. embeddings is None else np.vstack self. embeddings, new emb self. save def recall self, query: str, top k: int = 3, memory type: Optional str = None, min score: float = 0.3 - List Dict : """Retrieve most relevant memories for a query.""" if not self. entries: return query emb = self.embedder.encode query scores = cosine similarity query emb, self. embeddings 0 ranked idxs = np.argsort scores ::-1 results = for idx in ranked idxs: if len results = top k: break entry = self. entries idx score = float scores idx if score < min score: continue if memory type and entry "type" = memory type: continue results.append { entry, "relevance score": round score, 4 } return results def forget self, memory id: str : """Remove a specific memory.""" idx = next i for i, e in enumerate self. entries if e "id" == memory id , None if idx is not None: self. entries.pop idx self. embeddings = np.delete self. embeddings, idx, axis=0 self. save def save self : data path = self.persist path / "memories.json" with open data path, "w" as f: json.dump self. entries, f, indent=2 if self. embeddings is not None: np.save self.persist path / "embeddings.npy", self. embeddings def load self : data path = self.persist path / "memories.json" emb path = self.persist path / "embeddings.npy" if data path.exists : with open data path as f: self. entries = json.load f if emb path.exists : self. embeddings = np.load emb path def len self : return len self. entries sm = SemanticMemory persist path="./test agent memory" sm.remember "User is building a recommendation system for e-commerce", "preference" sm.remember "User prefers Python and PyTorch over TensorFlow", "preference" sm.remember "Previous session: debugged a cosine similarity bug in the recommendation engine", "episode" sm.remember "User's company uses PostgreSQL for the main database", "fact" sm.remember "User struggled with cold-start problem for new users", "episode" sm.remember "Solved cold-start by using content-based features initially", "solution" print "Semantic Memory Demo:" print f" Stored memories: {len sm }" print queries = "What database does this user use?", "Has this user had problems with new users?", "What tools does this user prefer?", for query in queries: results = sm.recall query, top k=2 print f" Query: '{query}'" for r in results: print f" {r 'relevance score' :.3f} {r 'type' } {r 'content' :60 }" print python import sqlite3 from contextlib import contextmanager class EpisodicMemory: """ Structured memory for facts, preferences, and past events. Uses SQLite for persistence. Think of it as the agent's 'fact file.' """ def init self, db path: str = "./agent episodes.db" : self.db path = db path self. init db def init db self : with self. conn as conn: conn.executescript """ CREATE TABLE IF NOT EXISTS facts key TEXT PRIMARY KEY, value TEXT NOT NULL, category TEXT DEFAULT 'general', confidence REAL DEFAULT 1.0, created at TEXT, updated at TEXT, source TEXT ; CREATE TABLE IF NOT EXISTS episodes id INTEGER PRIMARY KEY AUTOINCREMENT, action TEXT NOT NULL, result TEXT, success INTEGER DEFAULT 1, context TEXT, timestamp TEXT, session id TEXT ; CREATE TABLE IF NOT EXISTS preferences key TEXT PRIMARY KEY, value TEXT NOT NULL, updated at TEXT ; """ @contextmanager def conn self : conn = sqlite3.connect self.db path conn.row factory = sqlite3.Row try: yield conn conn.commit finally: conn.close def store fact self, key: str, value: str, category: str = "general", confidence: float = 1.0, source: str = "" : now = datetime.utcnow .isoformat with self. conn as conn: conn.execute """ INSERT OR REPLACE INTO facts VALUES ?, ?, ?, ?, COALESCE SELECT created at FROM facts WHERE key=? , ? , ?, ? """, key, value, category, confidence, key, now, now, source def get fact self, key: str - Optional Dict : with self. conn as conn: row = conn.execute "SELECT FROM facts WHERE key = ?", key, .fetchone return dict row if row else None def get facts by category self, category: str - List Dict : with self. conn as conn: rows = conn.execute "SELECT FROM facts WHERE category = ? ORDER BY updated at DESC", category, .fetchall return dict r for r in rows def log episode self, action: str, result: str = "", success: bool = True, context: str = "", session id: str = "" : with self. conn as conn: conn.execute """ INSERT INTO episodes action, result, success, context, timestamp, session id VALUES ?, ?, ?, ?, ?, ? """, action, result, int success , context, datetime.utcnow .isoformat , session id def get recent episodes self, n: int = 10, success only: bool = False - List Dict : query = "SELECT FROM episodes" if success only: query += " WHERE success = 1" query += " ORDER BY timestamp DESC LIMIT ?" with self. conn as conn: return dict r for r in conn.execute query, n, .fetchall def set preference self, key: str, value: str : with self. conn as conn: conn.execute "INSERT OR REPLACE INTO preferences VALUES ?, ?, ? ", key, value, datetime.utcnow .isoformat def get preference self, key: str, default: str = "" - str: with self. conn as conn: row = conn.execute "SELECT value FROM preferences WHERE key = ?", key, .fetchone return row "value" if row else default def get all preferences self - Dict str, str : with self. conn as conn: rows = conn.execute "SELECT key, value FROM preferences" .fetchall return {r "key" : r "value" for r in rows} em = EpisodicMemory db path="./test episodes.db" em.store fact "user name", "Rahul", category="identity" em.store fact "user role", "ML Engineer", category="identity" em.store fact "project type", "recommendation", category="project" em.store fact "db technology", "PostgreSQL", category="tech stack" em.store fact "preferred lang", "Python", category="preference" em.store fact "preferred ml lib", "PyTorch", category="preference" em.log episode "Helped debug cosine similarity", "Fixed shape mismatch", success=True, session id="sess 001" em.log episode "Explained collaborative filtering", "User understood", success=True, session id="sess 001" em.log episode "Tried matrix factorization approach", "Memory error on large data", success=False, session id="sess 002" em.set preference "response style", "concise with code examples" em.set preference "explanation depth", "intermediate" print "Episodic Memory Demo:" print print " User Facts:" for fact in em.get facts by category "identity" : print f" {fact 'key' }: {fact 'value' }" print print " Recent Episodes:" for ep in em.get recent episodes 3 : status = "✓" if ep "success" else "✗" print f" {status} {ep 'action' :50 }: {ep 'result' :40 }" print print " Preferences:" for key, val in em.get all preferences .items : print f" {key}: {val}" class MemoryAgent: """ A complete agent with all four memory types integrated. Personalizes responses based on accumulated memory. """ def init self, agent id: str = "agent default", model: str = "claude-3-5-haiku-20241022" : self.agent id = agent id self.client = anthropic.Anthropic api key=os.environ.get "ANTHROPIC API KEY" self.model = model self.working memory = WorkingMemory max turns=15 self.semantic memory = SemanticMemory persist path=f"./memory {agent id}/semantic" self.episodic memory = EpisodicMemory db path=f"./memory {agent id}/episodic.db" self. session id = hashlib.md5 str time.time .encode .hexdigest :8 def build memory context self, query: str - str: """Assemble relevant memories into a context block.""" parts = prefs = self.episodic memory.get all preferences if prefs: parts.append "User preferences: " + "; ".join f"{k}={v}" for k, v in prefs.items key facts = self.episodic memory.get facts by category "identity" key facts += self.episodic memory.get facts by category "project" if key facts: facts str = "; ".join f"{f 'key' }={f 'value' }" for f in key facts :5 parts.append f"Known facts: {facts str}" relevant memories = self.semantic memory.recall query, top k=3 if relevant memories: mem str = "\n".join f"- {m 'type' } {m 'content' }" for m in relevant memories parts.append f"Relevant past context:\n{mem str}" recent episodes = self.episodic memory.get recent episodes 3, success only=True if recent episodes: ep str = "; ".join ep "action" :40 for ep in recent episodes parts.append f"Recent successful actions: {ep str}" return "\n\n".join parts if parts else "" def chat self, user message: str, verbose: bool = False - str: self.working memory.add "user", user message memory context = self. build memory context user message system = f"""You are a helpful AI assistant with memory of past interactions. Use the provided context to personalize your responses. {f'Memory context:{chr 10 }{memory context}' if memory context else ''} Adapt your response to the user's known preferences and expertise level.""" response = self.client.messages.create model = self.model, max tokens = 800, system = system, messages = self.working memory.get messages answer = response.content 0 .text self.working memory.add "assistant", answer self.semantic memory.remember f"User asked: {user message :100 }", memory type = "conversation", metadata = {"session": self. session id} self.episodic memory.log episode action = f"Answered: {user message :50 }", result = "Success", session id = self. session id if verbose: used memories = len self.semantic memory.recall user message, top k=3 print f" Memory Used {used memories} relevant memories, " f"{len self.working memory } conversation turns in context" return answer mem agent = MemoryAgent agent id="rahul session" mem agent.episodic memory.store fact "user name", "Rahul", "identity" mem agent.episodic memory.store fact "project", "e-commerce recommender", "project" mem agent.episodic memory.set preference "explanation depth", "intermediate" mem agent.semantic memory.remember "User previously struggled with cold-start problem", "episode" print "\nMemory-Augmented Agent Demo:" print "=" 60 questions = "Can you remind me where we left off with my recommendation system?", "What approach did we decide to use for new users?", "I want to add diversity to the recommendations. Any ideas?", for q in questions: print f"\nUser: {q}" answer = mem agent.chat q, verbose=True print f"Agent: {answer :200 }..." class MemoryManager: """Handles memory maintenance: summarization, pruning, importance scoring.""" def init self, semantic memory: SemanticMemory, episodic memory: EpisodicMemory : self.semantic = semantic memory self.episodic = episodic memory def summarize session self, session id: str, llm client=None - str: """Compress a full session into a summary memory.""" episodes = ep for ep in self.episodic.get recent episodes 50 if ep.get "session id" == session id if not episodes: return "" session text = "\n".join f"- {ep 'action' }: {ep 'result' }" for ep in episodes summary = f"Session {session id}: " + "; ".join ep "action" :30 for ep in episodes :5 self.semantic.remember summary, memory type = "session summary", metadata = {"session id": session id} return summary def get memory stats self - Dict: return { "semantic memories": len self.semantic , "total episodes": len self.episodic.get recent episodes 1000 , "successful episodes": len self.episodic.get recent episodes 1000, success only=True , "stored preferences": len self.episodic.get all preferences , "stored facts": len self.episodic.get facts by category "identity" + self.episodic.get facts by category "project" , } mm = MemoryManager mem agent.semantic memory, mem agent.episodic memory print "\nMemory Statistics:" stats = mm.get memory stats for key, value in stats.items : print f" {key:<30}: {value}" print "\nAgent Memory Reference Links:" print refs = { "Papers": "MemGPT: Memory in LLM OS", "arxiv.org/abs/2310.08560" , "Generative Agents Stanford ", "arxiv.org/abs/2304.03442" , "Memory-Augmented LLM Survey", "arxiv.org/abs/2312.17512" , "Cognitive Architectures for LLMs", "arxiv.org/abs/2309.02427" , "Reflexion: Verbal Reinforcement", "arxiv.org/abs/2303.11366" , , "Implementations": "MemGPT GitHub", "github.com/cpacker/MemGPT" , "LangChain Memory docs", "python.langchain.com/docs/modules/memory" , "LlamaIndex Memory module", "docs.llamaindex.ai/en/stable/module guides/storing/index stores" , "Zep: Long-term memory for agents", "getzep.com" , "Mem0: Memory layer for AI", "mem0.ai" , , "Tutorials": "Building agents with memory Anthropic ", "github.com/anthropics/anthropic-cookbook" , "LangGraph memory persistence", "langchain-ai.github.io/langgraph/how-tos/persistence" , "Vector memory with ChromaDB", "docs.trychroma.com/usage-guide" , , "Cheat Sheets": "SQLite Python reference", "docs.python.org/3/library/sqlite3.html" , "Sentence Transformers quickstart", "sbert.net/docs/quickstart.html" , "NumPy array operations", "numpy.org/doc/stable/reference/routines.array-manipulation" , , } for category, links in refs.items : print f" {category}:" for name, url in links: print f" • {name:<48} {url}" print Create agent memory practice.py . Part 1: build the four-type memory system from this post. Initialize WorkingMemory, SemanticMemory, and EpisodicMemory. Run a 5-turn conversation. After each turn, store the exchange in both semantic embedding and episodic SQLite memory. Verify both stores contain the data. Part 2: test cross-session recall. Start a new conversation. Without providing any prior context, ask the agent something that requires remembering a fact from the previous session. Does it retrieve the relevant memory and personalize the response? Part 3: memory retrieval comparison. Take 10 queries. For each, retrieve top 3 results from semantic memory. Also retrieve results from episodic memory by category. Compare what each memory type surfaces. When is each one more useful? Part 4: memory decay. Add a "recency weight" to semantic memory recall: recent memories score higher than old ones. Implement this by multiplying the cosine similarity score by a decay factor based on age. Does it change which memories get retrieved? Agents with memory are powerful. Agents that can write and execute code are transformative. The next post covers code agents: agents that write Python, run it, observe the output, and iteratively improve their code until it solves the problem. This is how GitHub Copilot and Cursor work at their core.