LLM Memory System Pitfalls: A 3-Hour Bug Hunt Solved with Pytest Snapshot Testing

A developer spent three hours debugging a production LLM memory system bug where a `rollback` method wiped out entire conversation histories instead of just undoing erroneous operations. The root cause was a code refactor that accidentally cleared the `snapshots` table during rollback, which existing unit tests failed to catch because they always started from empty databases and never simulated cross-session persistence. The developer resolved the issue by implementing snapshot testing that treats the SQLite database file itself as an immutable artifact, enabling tests to verify file-level persistent state across different connections.

It was 2 a.m. when the alert call jolted me awake — our production Agent had suffered “amnesia” for three consecutive conversations. The context the user had carefully built was gone, and complaints were flooding in. Squinting at the logs, I discovered that the rollback method in the memory management module had been broken by an innocuous-looking code refactor. Not only did the rollback undo the erroneous operation, it also wiped out the entire conversation history. Worse still, our existing unit tests never caught the bug: they always started from a fresh empty database and could never cover a cross-session scenario like “roll back dirty data to a previous snapshot.” I spent three hours debugging, manually simulating intermediate states, before I finally pinpointed the root cause. That’s when it hit me: we weren't lacking tests — we were missing snapshot tests that capture the entire “memory state.” Our LLM memory system uses SQLite for local persistence. Each session owns a table that stores conversation turns, vector summaries, and tool-call records. Two critical operations are: save snapshot session id : serializes the full state of a session into the snapshots table, creating a rollback checkpoint. rollback to snapshot session id, snapshot id : when something goes wrong, it rebuilds the session table from a snapshot and discards all changes made after that point.This mechanism had been running smoothly — until a refactor I made changed the transaction boundaries inside the rollback logic. After the rollback executed, the conversations table was rebuilt just fine, but the snapshots table itself was accidentally wiped out. The next rollback attempt couldn’t find any previous checkpoints. Why didn’t traditional unit tests catch this? Because the typical test flow looks like this: python def test rollback : db = create in memory db db.save snapshot "s1" db.rollback to snapshot "s1", ... assert db.get conversation "s1" == expected Everything runs in a single process, inside a single temporary database. However, the production scenario was different: process A saves a snapshot and exits, then process B reopens the same database file and performs the rollback. File-level persistent state, WAL log merging, and even the visibility of the snapshots table across different connections — none of that was tested. To put it bluntly, we tested the “logic” but never tested the “storage.” I decided to bring in snapshot testing , but instead of using text-based snapshots, I would treat the SQLite database file itself as an immutable artifact. Comparison of approaches: tmp path + manual comparisonThe architectural idea: provide a snapshot db fixture via conftest.py that: tests/snapshots/memory test.sqlite exists before the test starts. --snapshot-update flag and the test passes immediately.With this approach, our tests truly simulate a “cross-process, cross-connection” persistence effect — each test case receives an independent copy of a database file, performs its operations, and then the entire file state is compared against the expected outcome. This code clarifies what we intend to test. MemoryManager wraps the SQLite connection, snapshot saving, and rollback — a simplified version of what we use in production. python memory manager.py import sqlite3 import uuid from datetime import datetime, timezone class MemoryManager: def init self, db path: str : self.db path = db path self. init tables def get conn self - sqlite3.Connection: conn = sqlite3.connect self.db path conn.execute "PRAGMA journal mode=WAL" conn.row factory = sqlite3.Row return conn def init tables self : with self. get conn as conn: conn.executescript """ CREATE TABLE IF NOT EXISTS conversations session id TEXT NOT NULL, turn INTEGER NOT NULL, role TEXT NOT NULL, content TEXT NOT NULL, PRIMARY KEY session id, turn ; CREATE TABLE IF NOT EXISTS snapshots snapshot id TEXT PRIMARY KEY, session id TEXT NOT NULL, created at TEXT NOT NULL, state json TEXT NOT NULL ; """ def add message self, session id: str, role: str, content: str : with self. get conn as conn: turn = conn.execute "SELECT COALESCE MAX turn , 0 + 1 FROM conversations WHERE session id = ?",