103. Agent Memory: Short-Term, Long-Term, and Episodic

wpnews.pro

Main Thumbnail Image Prompt: A human brain cross-section illustration in neon tones on dark background. Three regions clearly demarcated and labeled. The hippocampus region glows blue, labeled "Episodic Memory: what happened." The prefrontal cortex glows orange, labeled "Working Memory: what I'm doing now." A network of distributed nodes glows green, labeled "Semantic Memory: what I know." Arrows show information flowing between regions. Scientific but accessible, the memory architecture made neural and visual.

Memory Architecture Diagram Image Prompt: Four storage boxes arranged vertically on dark background. Top: "In-Context Window (Working Memory)" — fastest, smallest, temporary, shown as RAM chip icon. Second: "External Vector Store (Semantic Memory)" — fast retrieval, persistent, shown as cylinder with search icon. Third: "Key-Value Store (Episodic Memory)" — structured facts, shown as database icon. Bottom: "Fine-Tuned Weights (Procedural Memory)" — slowest to update, most permanent, shown as brain with lock. Arrows showing read/write speeds between boxes. Clean, technical, the hierarchy is the insight.

Memory Retrieval Flow Image Prompt: A query arrives at an agent on the left. Four parallel arrows go right to four memory sources: conversation history (short chat bubbles), vector database (semantic search visualization), structured database (table icon), model weights (brain icon). Each source returns relevant items. A "Memory Fusion" box on the right combines the results. The agent sees an enriched context. The retrieval from multiple stores is the architecture.

Every conversation with an LLM starts from zero.

You explain your project. You explain your preferences. You explain your constraints. You spend five minutes providing context. You come back tomorrow. You do it all again.

The model remembers nothing between sessions. The context window closes. The state is gone. Every interaction is the agent's first day on the job.

Human productivity depends on memory. We remember what worked last time. We build on past experience. We know our tools, our colleagues, our recurring problems. We do not start from scratch daily.

Agents with memory do this. They remember past conversations. They recall relevant facts. They store successful strategies. They build up a model of the user's preferences and project context over time.

This post builds all four types of agent memory from scratch.

import os
import json
import time
import hashlib
from typing import List, Dict, Optional, Any, Tuple
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
import anthropic
import numpy as np

print("The Four Types of Agent Memory:")
print()

memory_types = {
    "Working Memory (In-Context)": {
        "speed":       "Instant",
        "capacity":    "Limited by context window (~200K tokens)",
        "persistence": "Session only — gone when conversation ends",
        "best_for":    "Current conversation, active task state",
        "implementation": "messages list in API call",
    },
    "Semantic Memory (Vector Store)": {
        "speed":       "Fast (milliseconds)",
        "capacity":    "Millions of embeddings",
        "persistence": "Persistent across sessions",
        "best_for":    "Knowledge base, past conversations, documents",
        "implementation": "ChromaDB, Pinecone, FAISS",
    },
    "Episodic Memory (Structured Store)": {
        "speed":       "Fast (key-value lookup)",
        "capacity":    "Unlimited",
        "persistence": "Persistent across sessions",
        "best_for":    "User preferences, facts, past actions, outcomes",
        "implementation": "SQLite, Redis, JSON files",
    },
    "Procedural Memory (Weights)": {
        "speed":       "Instant (baked in)",
        "capacity":    "Model-dependent",
        "persistence": "Requires fine-tuning to update",
        "best_for":    "Skills, domain knowledge, behavioral patterns",
        "implementation": "Fine-tuning, LoRA adapters",
    },
}

for name, info in memory_types.items():
    print(f"  {name}:")
    for key, val in info.items():
        print(f"    {key:<18}: {val}")
    print()
class WorkingMemory:
    """
    Short-term memory that lives in the context window.
    Automatically manages the sliding window to stay within token limits.
    """

    def __init__(self, max_turns: int = 20, max_tokens: int = 50000):
        self.turns:      List[Dict] = []
        self.max_turns   = max_turns
        self.max_tokens  = max_tokens
        self._token_count = 0

    def add(self, role: str, content: str):
        self.turns.append({
            "role":      role,
            "content":   content,
            "timestamp": datetime.utcnow().isoformat(),
            "tokens":    len(content.split()) * 1.3  # rough estimate
        })
        self._trim_if_needed()

    def _trim_if_needed(self):
        while len(self.turns) > self.max_turns * 2:
            self.turns.pop(0)

    def get_messages(self) -> List[Dict]:
        return [{"role": t["role"], "content": t["content"]} for t in self.turns]

    def get_recent(self, n_turns: int = 5) -> List[Dict]:
        recent = self.turns[-(n_turns * 2):]
        return [{"role": t["role"], "content": t["content"]} for t in recent]

    def summarize_old(self, keep_last: int = 5) -> str:
        """Compress old turns into a summary to free context space."""
        if len(self.turns) <= keep_last * 2:
            return ""
        old_turns = self.turns[:-(keep_last * 2)]
        summary_parts = []
        for turn in old_turns:
            if turn["role"] == "user":
                summary_parts.append(f"User asked about: {turn['content'][:50]}")
        return "Previous conversation summary: " + "; ".join(summary_parts)

    def clear(self):
        self.turns = []

    def __len__(self):
        return len(self.turns) // 2

wm = WorkingMemory(max_turns=10)
wm.add("user",      "My name is Rahul and I am building a recommendation system.")
wm.add("assistant", "Great! What type of recommendations? User-based or item-based?")
wm.add("user",      "User-based collaborative filtering for an e-commerce platform.")
wm.add("assistant", "For user-based CF, you will need a user-item interaction matrix...")

print("Working Memory Demo:")
print(f"  Current turns:  {len(wm)}")
print(f"  Messages in context: {len(wm.get_messages())}")
print()
print("  Recent context:")
for msg in wm.get_messages():
    print(f"    [{msg['role']:<10}]: {msg['content'][:60]}...")
python
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

class SemanticMemory:
    """
    Long-term memory stored as embeddings.
    Retrieves relevant past context by semantic similarity.
    Think of this as the agent's 'searchable journal.'
    """

    def __init__(self, embed_model: str = "all-MiniLM-L6-v2",
                 persist_path: str = "./agent_memory"):
        self.embedder      = SentenceTransformer(embed_model)
        self.persist_path  = Path(persist_path)
        self.persist_path.mkdir(exist_ok=True)

        self._entries:      List[Dict]  = []
        self._embeddings:   Optional[np.ndarray] = None
        self._load()

    def remember(self, content: str, memory_type: str = "conversation",
                 metadata: Dict = None):
        """Store a memory with embedding."""
        entry = {
            "id":          hashlib.md5(content.encode()).hexdigest()[:8],
            "content":     content,
            "type":        memory_type,
            "timestamp":   datetime.utcnow().isoformat(),
            "metadata":    metadata or {}
        }
        self._entries.append(entry)

        new_emb = self.embedder.encode([content])
        self._embeddings = (
            new_emb if self._embeddings is None
            else np.vstack([self._embeddings, new_emb])
        )
        self._save()

    def recall(self, query: str, top_k: int = 3,
               memory_type: Optional[str] = None,
               min_score: float = 0.3) -> List[Dict]:
        """Retrieve most relevant memories for a query."""
        if not self._entries:
            return []

        query_emb   = self.embedder.encode([query])
        scores      = cosine_similarity(query_emb, self._embeddings)[0]
        ranked_idxs = np.argsort(scores)[::-1]

        results = []
        for idx in ranked_idxs:
            if len(results) >= top_k:
                break
            entry = self._entries[idx]
            score = float(scores[idx])

            if score < min_score:
                continue
            if memory_type and entry["type"] != memory_type:
                continue

            results.append({**entry, "relevance_score": round(score, 4)})

        return results

    def forget(self, memory_id: str):
        """Remove a specific memory."""
        idx = next((i for i, e in enumerate(self._entries)
                    if e["id"] == memory_id), None)
        if idx is not None:
            self._entries.pop(idx)
            self._embeddings = np.delete(self._embeddings, idx, axis=0)
            self._save()

    def _save(self):
        data_path = self.persist_path / "memories.json"
        with open(data_path, "w") as f:
            json.dump(self._entries, f, indent=2)

        if self._embeddings is not None:
            np.save(self.persist_path / "embeddings.npy", self._embeddings)

    def _load(self):
        data_path = self.persist_path / "memories.json"
        emb_path  = self.persist_path / "embeddings.npy"

        if data_path.exists():
            with open(data_path) as f:
                self._entries = json.load(f)

        if emb_path.exists():
            self._embeddings = np.load(emb_path)

    def __len__(self):
        return len(self._entries)

sm = SemanticMemory(persist_path="./test_agent_memory")

sm.remember("User is building a recommendation system for e-commerce", "preference")
sm.remember("User prefers Python and PyTorch over TensorFlow", "preference")
sm.remember("Previous session: debugged a cosine similarity bug in the recommendation engine", "episode")
sm.remember("User's company uses PostgreSQL for the main database", "fact")
sm.remember("User struggled with cold-start problem for new users", "episode")
sm.remember("Solved cold-start by using content-based features initially", "solution")

print("Semantic Memory Demo:")
print(f"  Stored memories: {len(sm)}")
print()

queries = [
    "What database does this user use?",
    "Has this user had problems with new users?",
    "What tools does this user prefer?",
]

for query in queries:
    results = sm.recall(query, top_k=2)
    print(f"  Query: '{query}'")
    for r in results:
        print(f"    [{r['relevance_score']:.3f}] ({r['type']}) {r['content'][:60]}")
    print()
python
import sqlite3
from contextlib import contextmanager

class EpisodicMemory:
    """
    Structured memory for facts, preferences, and past events.
    Uses SQLite for persistence. Think of it as the agent's 'fact file.'
    """

    def __init__(self, db_path: str = "./agent_episodes.db"):
        self.db_path = db_path
        self._init_db()

    def _init_db(self):
        with self._conn() as conn:
            conn.executescript("""
                CREATE TABLE IF NOT EXISTS facts (
                    key         TEXT PRIMARY KEY,
                    value       TEXT NOT NULL,
                    category    TEXT DEFAULT 'general',
                    confidence  REAL DEFAULT 1.0,
                    created_at  TEXT,
                    updated_at  TEXT,
                    source      TEXT
                );

                CREATE TABLE IF NOT EXISTS episodes (
                    id          INTEGER PRIMARY KEY AUTOINCREMENT,
                    action      TEXT NOT NULL,
                    result      TEXT,
                    success     INTEGER DEFAULT 1,
                    context     TEXT,
                    timestamp   TEXT,
                    session_id  TEXT
                );

                CREATE TABLE IF NOT EXISTS preferences (
                    key         TEXT PRIMARY KEY,
                    value       TEXT NOT NULL,
                    updated_at  TEXT
                );
            """)

    @contextmanager
    def _conn(self):
        conn = sqlite3.connect(self.db_path)
        conn.row_factory = sqlite3.Row
        try:
            yield conn
            conn.commit()
        finally:
            conn.close()

    def store_fact(self, key: str, value: str,
                   category: str = "general",
                   confidence: float = 1.0,
                   source: str = ""):
        now = datetime.utcnow().isoformat()
        with self._conn() as conn:
            conn.execute("""
                INSERT OR REPLACE INTO facts
                VALUES (?, ?, ?, ?, COALESCE((SELECT created_at FROM facts WHERE key=?), ?), ?, ?)
            """, (key, value, category, confidence, key, now, now, source))

    def get_fact(self, key: str) -> Optional[Dict]:
        with self._conn() as conn:
            row = conn.execute(
                "SELECT * FROM facts WHERE key = ?", (key,)).fetchone()
            return dict(row) if row else None

    def get_facts_by_category(self, category: str) -> List[Dict]:
        with self._conn() as conn:
            rows = conn.execute(
                "SELECT * FROM facts WHERE category = ? ORDER BY updated_at DESC",
                (category,)).fetchall()
            return [dict(r) for r in rows]

    def log_episode(self, action: str, result: str = "",
                     success: bool = True, context: str = "",
                     session_id: str = ""):
        with self._conn() as conn:
            conn.execute("""
                INSERT INTO episodes (action, result, success, context, timestamp, session_id)
                VALUES (?, ?, ?, ?, ?, ?)
            """, (action, result, int(success), context,
                  datetime.utcnow().isoformat(), session_id))

    def get_recent_episodes(self, n: int = 10,
                             success_only: bool = False) -> List[Dict]:
        query = "SELECT * FROM episodes"
        if success_only:
            query += " WHERE success = 1"
        query += " ORDER BY timestamp DESC LIMIT ?"
        with self._conn() as conn:
            return [dict(r) for r in conn.execute(query, (n,)).fetchall()]

    def set_preference(self, key: str, value: str):
        with self._conn() as conn:
            conn.execute(
                "INSERT OR REPLACE INTO preferences VALUES (?, ?, ?)",
                (key, value, datetime.utcnow().isoformat()))

    def get_preference(self, key: str, default: str = "") -> str:
        with self._conn() as conn:
            row = conn.execute(
                "SELECT value FROM preferences WHERE key = ?", (key,)).fetchone()
            return row["value"] if row else default

    def get_all_preferences(self) -> Dict[str, str]:
        with self._conn() as conn:
            rows = conn.execute("SELECT key, value FROM preferences").fetchall()
            return {r["key"]: r["value"] for r in rows}

em = EpisodicMemory(db_path="./test_episodes.db")

em.store_fact("user_name",        "Rahul",           category="identity")
em.store_fact("user_role",        "ML Engineer",      category="identity")
em.store_fact("project_type",     "recommendation",   category="project")
em.store_fact("db_technology",    "PostgreSQL",        category="tech_stack")
em.store_fact("preferred_lang",   "Python",            category="preference")
em.store_fact("preferred_ml_lib", "PyTorch",           category="preference")

em.log_episode("Helped debug cosine similarity", "Fixed shape mismatch",
               success=True, session_id="sess_001")
em.log_episode("Explained collaborative filtering", "User understood",
               success=True, session_id="sess_001")
em.log_episode("Tried matrix factorization approach", "Memory error on large data",
               success=False, session_id="sess_002")

em.set_preference("response_style", "concise with code examples")
em.set_preference("explanation_depth", "intermediate")

print("Episodic Memory Demo:")
print()
print("  User Facts:")
for fact in em.get_facts_by_category("identity"):
    print(f"    {fact['key']}: {fact['value']}")

print()
print("  Recent Episodes:")
for ep in em.get_recent_episodes(3):
    status = "✓" if ep["success"] else "✗"
    print(f"    {status} {ep['action'][:50]}: {ep['result'][:40]}")

print()
print("  Preferences:")
for key, val in em.get_all_preferences().items():
    print(f"    {key}: {val}")
class MemoryAgent:
    """
    A complete agent with all four memory types integrated.
    Personalizes responses based on accumulated memory.
    """

    def __init__(self, agent_id: str = "agent_default",
                 model: str = "claude-3-5-haiku-20241022"):
        self.agent_id = agent_id
        self.client   = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
        self.model    = model

        self.working_memory  = WorkingMemory(max_turns=15)
        self.semantic_memory = SemanticMemory(
            persist_path=f"./memory_{agent_id}/semantic")
        self.episodic_memory = EpisodicMemory(
            db_path=f"./memory_{agent_id}/episodic.db")

        self._session_id = hashlib.md5(
            str(time.time()).encode()).hexdigest()[:8]

    def _build_memory_context(self, query: str) -> str:
        """Assemble relevant memories into a context block."""
        parts = []

        prefs = self.episodic_memory.get_all_preferences()
        if prefs:
            parts.append("User preferences: " +
                         "; ".join(f"{k}={v}" for k, v in prefs.items()))

        key_facts = self.episodic_memory.get_facts_by_category("identity")
        key_facts += self.episodic_memory.get_facts_by_category("project")
        if key_facts:
            facts_str = "; ".join(f"{f['key']}={f['value']}" for f in key_facts[:5])
            parts.append(f"Known facts: {facts_str}")

        relevant_memories = self.semantic_memory.recall(query, top_k=3)
        if relevant_memories:
            mem_str = "\n".join(
                f"- [{m['type']}] {m['content']}" for m in relevant_memories)
            parts.append(f"Relevant past context:\n{mem_str}")

        recent_episodes = self.episodic_memory.get_recent_episodes(3, success_only=True)
        if recent_episodes:
            ep_str = "; ".join(ep["action"][:40] for ep in recent_episodes)
            parts.append(f"Recent successful actions: {ep_str}")

        return "\n\n".join(parts) if parts else ""

    def chat(self, user_message: str, verbose: bool = False) -> str:
        self.working_memory.add("user", user_message)

        memory_context = self._build_memory_context(user_message)

        system = f"""You are a helpful AI assistant with memory of past interactions.
Use the provided context to personalize your responses.

{f'Memory context:{chr(10)}{memory_context}' if memory_context else ''}

Adapt your response to the user's known preferences and expertise level."""

        response = self.client.messages.create(
            model      = self.model,
            max_tokens = 800,
            system     = system,
            messages   = self.working_memory.get_messages()
        )
        answer = response.content[0].text
        self.working_memory.add("assistant", answer)

        self.semantic_memory.remember(
            f"User asked: {user_message[:100]}",
            memory_type = "conversation",
            metadata    = {"session": self._session_id}
        )
        self.episodic_memory.log_episode(
            action     = f"Answered: {user_message[:50]}",
            result     = "Success",
            session_id = self._session_id
        )

        if verbose:
            used_memories = len(self.semantic_memory.recall(user_message, top_k=3))
            print(f"  [Memory] Used {used_memories} relevant memories, "
                  f"{len(self.working_memory)} conversation turns in context")

        return answer

mem_agent = MemoryAgent(agent_id="rahul_session")

mem_agent.episodic_memory.store_fact("user_name", "Rahul", "identity")
mem_agent.episodic_memory.store_fact("project",   "e-commerce recommender", "project")
mem_agent.episodic_memory.set_preference("explanation_depth", "intermediate")
mem_agent.semantic_memory.remember(
    "User previously struggled with cold-start problem", "episode")

print("\nMemory-Augmented Agent Demo:")
print("=" * 60)

questions = [
    "Can you remind me where we left off with my recommendation system?",
    "What approach did we decide to use for new users?",
    "I want to add diversity to the recommendations. Any ideas?",
]

for q in questions:
    print(f"\nUser: {q}")
    answer = mem_agent.chat(q, verbose=True)
    print(f"Agent: {answer[:200]}...")
class MemoryManager:
    """Handles memory maintenance: summarization, pruning, importance scoring."""

    def __init__(self, semantic_memory: SemanticMemory,
                 episodic_memory: EpisodicMemory):
        self.semantic = semantic_memory
        self.episodic = episodic_memory

    def summarize_session(self, session_id: str,
                           llm_client=None) -> str:
        """Compress a full session into a summary memory."""
        episodes = [
            ep for ep in self.episodic.get_recent_episodes(50)
            if ep.get("session_id") == session_id
        ]

        if not episodes:
            return ""

        session_text = "\n".join(
            f"- {ep['action']}: {ep['result']}" for ep in episodes)

        summary = (
            f"Session {session_id}: " +
            "; ".join(ep["action"][:30] for ep in episodes[:5])
        )

        self.semantic.remember(
            summary,
            memory_type = "session_summary",
            metadata    = {"session_id": session_id}
        )
        return summary

    def get_memory_stats(self) -> Dict:
        return {
            "semantic_memories":     len(self.semantic),
            "total_episodes":        len(self.episodic.get_recent_episodes(1000)),
            "successful_episodes":   len(self.episodic.get_recent_episodes(1000, success_only=True)),
            "stored_preferences":    len(self.episodic.get_all_preferences()),
            "stored_facts":          len(self.episodic.get_facts_by_category("identity") +
                                         self.episodic.get_facts_by_category("project")),
        }

mm = MemoryManager(mem_agent.semantic_memory, mem_agent.episodic_memory)

print("\nMemory Statistics:")
stats = mm.get_memory_stats()
for key, value in stats.items():
    print(f"  {key:<30}: {value}")
print("\nAgent Memory Reference Links:")
print()

refs = {
    "Papers": [
        ("MemGPT: Memory in LLM OS",           "arxiv.org/abs/2310.08560"),
        ("Generative Agents (Stanford)",        "arxiv.org/abs/2304.03442"),
        ("Memory-Augmented LLM Survey",         "arxiv.org/abs/2312.17512"),
        ("Cognitive Architectures for LLMs",    "arxiv.org/abs/2309.02427"),
        ("Reflexion: Verbal Reinforcement",     "arxiv.org/abs/2303.11366"),
    ],
    "Implementations": [
        ("MemGPT GitHub",                        "github.com/cpacker/MemGPT"),
        ("LangChain Memory docs",                "python.langchain.com/docs/modules/memory"),
        ("LlamaIndex Memory module",             "docs.llamaindex.ai/en/stable/module_guides/storing/index_stores"),
        ("Zep: Long-term memory for agents",     "getzep.com"),
        ("Mem0: Memory layer for AI",            "mem0.ai"),
    ],
    "Tutorials": [
        ("Building agents with memory (Anthropic)", "github.com/anthropics/anthropic-cookbook"),
        ("LangGraph memory persistence",             "langchain-ai.github.io/langgraph/how-tos/persistence"),
        ("Vector memory with ChromaDB",              "docs.trychroma.com/usage-guide"),
    ],
    "Cheat Sheets": [
        ("SQLite Python reference",              "docs.python.org/3/library/sqlite3.html"),
        ("Sentence Transformers quickstart",     "sbert.net/docs/quickstart.html"),
        ("NumPy array operations",               "numpy.org/doc/stable/reference/routines.array-manipulation"),
    ],
}

for category, links in refs.items():
    print(f"  {category}:")
    for name, url in links:
        print(f"    • {name:<48} {url}")
    print()

Create agent_memory_practice.py

.

Part 1: build the four-type memory system from this post. Initialize WorkingMemory, SemanticMemory, and EpisodicMemory. Run a 5-turn conversation. After each turn, store the exchange in both semantic (embedding) and episodic (SQLite) memory. Verify both stores contain the data.

Part 2: test cross-session recall. Start a new conversation. Without providing any prior context, ask the agent something that requires remembering a fact from the previous session. Does it retrieve the relevant memory and personalize the response?

Part 3: memory retrieval comparison. Take 10 queries. For each, retrieve top 3 results from semantic memory. Also retrieve results from episodic memory by category. Compare what each memory type surfaces. When is each one more useful?

Part 4: memory decay. Add a "recency weight" to semantic memory recall: recent memories score higher than old ones. Implement this by multiplying the cosine similarity score by a decay factor based on age. Does it change which memories get retrieved?

Agents with memory are powerful. Agents that can write and execute code are transformative. The next post covers code agents: agents that write Python, run it, observe the output, and iteratively improve their code until it solves the problem. This is how GitHub Copilot and Cursor work at their core.

source & further reading

dev.to — original article Empty Is Not Clean: Five Fail-Open Bugs in an AI Agent I Pitted China's Best Open AI Models Against Each Other A plugin can pass validation and still fail after install

103. Agent Memory: Short-Term, Long-Term, and Episodic

Run your AI side-project on zahid.host