{"slug": "llm-memory-system-pitfalls-a-3-hour-bug-hunt-solved-with-pytest-snapshot-testing", "title": "LLM Memory System Pitfalls: A 3-Hour Bug Hunt Solved with Pytest Snapshot Testing", "summary": "A developer spent three hours debugging a production LLM memory system bug where a `rollback` method wiped out entire conversation histories instead of just undoing erroneous operations. The root cause was a code refactor that accidentally cleared the `snapshots` table during rollback, which existing unit tests failed to catch because they always started from empty databases and never simulated cross-session persistence. The developer resolved the issue by implementing snapshot testing that treats the SQLite database file itself as an immutable artifact, enabling tests to verify file-level persistent state across different connections.", "body_md": "It was 2 a.m. when the alert call jolted me awake — our production Agent had suffered “amnesia” for three consecutive conversations. The context the user had carefully built was gone, and complaints were flooding in. Squinting at the logs, I discovered that the `rollback`\n\nmethod in the memory management module had been broken by an innocuous-looking code refactor. Not only did the rollback undo the erroneous operation, it also wiped out the entire conversation history. Worse still, our existing unit tests never caught the bug: they always started from a fresh empty database and could never cover a cross-session scenario like “roll back dirty data to a previous snapshot.” I spent three hours debugging, manually simulating intermediate states, before I finally pinpointed the root cause. That’s when it hit me: **we weren't lacking tests — we were missing snapshot tests that capture the entire “memory state.”**\n\nOur LLM memory system uses SQLite for local persistence. Each session owns a table that stores conversation turns, vector summaries, and tool-call records. Two critical operations are:\n\n`save_snapshot(session_id)`\n\n: serializes the full state of a session into the `snapshots`\n\ntable, creating a rollback checkpoint.`rollback_to_snapshot(session_id, snapshot_id)`\n\n: when something goes wrong, it rebuilds the session table from a snapshot and discards all changes made after that point.This mechanism had been running smoothly — until a refactor I made changed the transaction boundaries inside the rollback logic. After the rollback executed, the `conversations`\n\ntable was rebuilt just fine, but the `snapshots`\n\ntable itself was accidentally wiped out. The next rollback attempt couldn’t find any previous checkpoints.\n\nWhy didn’t traditional unit tests catch this? Because the typical test flow looks like this:\n\n``` python\ndef test_rollback():\n    db = create_in_memory_db()\n    db.save_snapshot(\"s1\")\n    db.rollback_to_snapshot(\"s1\", ...)\n    assert db.get_conversation(\"s1\") == expected\n```\n\nEverything runs in a single process, inside a single temporary database. However, the production scenario was different: **process A** saves a snapshot and exits, then **process B** reopens the same database file and performs the rollback. File-level persistent state, WAL log merging, and even the visibility of the `snapshots`\n\ntable across different connections — none of that was tested. To put it bluntly, we tested the “logic” but never tested the “storage.”\n\nI decided to bring in **snapshot testing**, but instead of using text-based snapshots, I would treat the SQLite database file itself as an immutable artifact.\n\nComparison of approaches:\n\n`tmp_path`\n\n+ manual comparisonThe architectural idea: provide a `snapshot_db`\n\nfixture via `conftest.py`\n\nthat:\n\n`tests/snapshots/memory_test.sqlite`\n\n) exists before the test starts.`--snapshot-update`\n\nflag) and the test passes immediately.With this approach, our tests truly simulate a “cross-process, cross-connection” persistence effect — each test case receives an independent copy of a database file, performs its operations, and then the entire file state is compared against the expected outcome.\n\nThis code clarifies what we intend to test. `MemoryManager`\n\nwraps the SQLite connection, snapshot saving, and rollback — a simplified version of what we use in production.\n\n``` python\n# memory_manager.py\nimport sqlite3\nimport uuid\nfrom datetime import datetime, timezone\n\nclass MemoryManager:\n    def __init__(self, db_path: str):\n        self.db_path = db_path\n        self._init_tables()\n\n    def _get_conn(self) -> sqlite3.Connection:\n        conn = sqlite3.connect(self.db_path)\n        conn.execute(\"PRAGMA journal_mode=WAL\")\n        conn.row_factory = sqlite3.Row\n        return conn\n\n    def _init_tables(self):\n        with self._get_conn() as conn:\n            conn.executescript(\"\"\"\n                CREATE TABLE IF NOT EXISTS conversations (\n                    session_id TEXT NOT NULL,\n                    turn INTEGER NOT NULL,\n                    role TEXT NOT NULL,\n                    content TEXT NOT NULL,\n                    PRIMARY KEY (session_id, turn)\n                );\n                CREATE TABLE IF NOT EXISTS snapshots (\n                    snapshot_id TEXT PRIMARY KEY,\n                    session_id TEXT NOT NULL,\n                    created_at TEXT NOT NULL,\n                    state_json TEXT NOT NULL\n                );\n            \"\"\")\n\n    def add_message(self, session_id: str, role: str, content: str):\n        with self._get_conn() as conn:\n            turn = conn.execute(\n                \"SELECT COALESCE(MAX(turn), 0) + 1 FROM conversations WHERE session_id = ?\",\n```\n\n", "url": "https://wpnews.pro/news/llm-memory-system-pitfalls-a-3-hour-bug-hunt-solved-with-pytest-snapshot-testing", "canonical_source": "https://dev.to/_eb7f2a654e97a60ae9f96e/llm-memory-system-pitfalls-a-3-hour-bug-hunt-solved-with-pytest-snapshot-testing-4m0p", "published_at": "2026-06-12 01:04:03+00:00", "updated_at": "2026-06-12 01:42:27.809487+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "mlops", "ai-tools"], "entities": ["SQLite", "Pytest"], "alternates": {"html": "https://wpnews.pro/news/llm-memory-system-pitfalls-a-3-hour-bug-hunt-solved-with-pytest-snapshot-testing", "markdown": "https://wpnews.pro/news/llm-memory-system-pitfalls-a-3-hour-bug-hunt-solved-with-pytest-snapshot-testing.md", "text": "https://wpnews.pro/news/llm-memory-system-pitfalls-a-3-hour-bug-hunt-solved-with-pytest-snapshot-testing.txt", "jsonld": "https://wpnews.pro/news/llm-memory-system-pitfalls-a-3-hour-bug-hunt-solved-with-pytest-snapshot-testing.jsonld"}}