From Mock to Real Redis: Cutting Agent Memory Test Leakage from 30% to 0

A developer's AI customer service agent suffered a 30% memory leak rate in production after mock tests failed to simulate a Redis `ConnectionError` during a network hiccup, causing the agent to lose all context. The team rebuilt automated verification using pytest with a real Redis instance via testcontainers, eliminating online memory-related bugs entirely. The solution uses session-scoped Redis containers with isolated per-test namespaces to catch real-world failures that mocks miss.

Woken up by PagerDuty at 2 AM. The user group was on fire — our AI customer service agent suddenly lost its memory. One message confirmed the user's phone number, the next asked "How may I address you?" Checking the logs revealed that the Redis connection pool threw a ConnectionError during a network hiccup. Our supposedly bulletproof mock tests had never simulated that exception. The code simply skipped memory persistence, and all context was lost. Even scarier: the regression suite was green. This is a textbook disaster of having mocks without real-middleware tests. We spent two weeks rebuilding the automated verification of the agent’s memory module with pytest + a real Redis instance. The result: online memory-related bugs went from a 30% leakage rate to zero. Here’s the full blueprint, code, and the sharp edges we found along the way. An agent’s memory isn’t simple key-value storage. It handles three things: A typical mock test patches redis.Redis with unittest.mock.patch or builds a fake client. It’s easy to reach 80% coverage this way. But the real world doesn’t offer “always-succeeding storage”: redis.exceptions.ConnectionError . The mock version just returns True . allkeys-lru silently drop keys when memory gets tight. Your agent’s notepad vanishes. A mock only returns None when you explicitly set it.The root cause: a mock simulates the Redis you imagine, not real Redis. Unit tests verify logic branches but can’t expose process boundaries, network boundaries, or data consistency issues. For a module like agent memory that depends heavily on external state, integration tests must run against real Redis. Otherwise you’ll eventually pay the technical debt in production. We considered three paths for testing with real Redis: We chose pytest + redis-py + testcontainers-python with a fallback to docker-compose . The reason: testcontainers lets you declare a Redis container right in conftest , automatically waits for the port to be ready, and destroys it when tests end — no extra scripting. Combined with a scope="session" fixture to share the container across tests, each test function uses an isolated Redis namespace a prefix or a dedicated db number to avoid cross-contamination while staying realistic. Why not just run tests against a real production Redis on a dedicated db number? If you accidentally misconfigure something and flushdb isn’t blocked, you’ll have another horror story to tell. Here are three key code blocks: the container-management fixture, the memory storage implementation, and the test cases. You should be able to drop them into your project and run. This solves the “how to get a clean Redis instance for tests” problem. We use testcontainers to launch Redis 7 and add a manual wait for to be absolutely sure the container is ready before handing it over. python conftest.py import pytest import redis from testcontainers.redis import RedisContainer @pytest.fixture scope="session" def redis container : """Session-scoped Redis container – started once per test session""" container = RedisContainer "redis:7-alpine" container.with exposed ports 6379 container.start Ensure Redis is truly ready the built-in wait strategy sometimes isn't enough client = redis.Redis host=container.get container host ip , port=container.get exposed port 6379 , client.ping Will fail fast here if not ready yield container container.stop @pytest.fixture def redis client redis container : """Per-test Redis client with an isolated DB to avoid interference""" client = redis.Redis host=redis container.get container host ip , port=redis container.get exposed port 6379 , db=0, decode responses=True, avoid manual .decode client.flushdb Start clean yield client client.close WhyThe agent’s memory module mostly stores natural language text. Returning decode responses=True ? bytes forces a .decode everywhere, adding noise that buries the actual assertions. This is a simplified version of production code, living in memory store.py . It stores session context in a Redis Hash with a TTL. We intentionally kept a serialization path that is easy to break in production using json instead of pickle, but validating data integrity . python memory store.py import json import redis from typing import Dict, Optional import logging logger = logging.getLogger name class AgentMemoryStore: def init self, redis client: redis.Redis, ttl: int = 3600 : self.redis = redis client self.ttl = ttl def save context self, session id: str, context: Dict - bool: try: key = f"agent:session:{session id}" serialized = json.dumps context, default=str self.redis.setex key, self.ttl, serialized return True except redis.exceptions.ConnectionError, TypeError as e: logger.error f"Failed to save context for {session id}: {e}" return False def load context self, session id: str - Optional Dict : try: key = f"agent:session:{session id}" data = self.redis.get key if data is None: return None return json.loads data except redis.exceptions.ConnectionError, json.JSONDecodeError as e: logger.error f"Failed to load context for {session id}: {e}" return None This tests the happy path and the scenarios that mock tests never touch: connection failures, serialization errors, and Redis eviction behavior. python test memory store real.py import json import pytest from unittest.mock import patch import redis from memory store import AgentMemoryStore def test save and load context success redis client : store = AgentMemoryStore redis client context = {"phone": "13800138000", "intent": "return order"} assert store.save context "session 1", context loaded = store.load context "session 1" assert loaded == context def test context not found redis client : store = AgentMemoryStore redis client assert store.load context "nonexistent session" is None def test connection error graceful failure redis client, mocker : """Simulate a ConnectionError — the store must handle it gracefully.""" store = AgentMemoryStore redis client Force connection failure on setex mocker.patch.object redis client, 'setex', side effect=redis.exceptions.ConnectionError "boom" result = store.save context "session err", {"a": 1} assert result is False def test serialization error handling redis client : """What happens when we try to save an unserializable object?""" store = AgentMemoryStore redis client bad context = {"fn": lambda x: x} lambda not serializable by json result = store.save context "session bad", bad context Should gracefully fail, not throw assert result is False def test eviction behavior redis client : """Set a tiny TTL and wait — then check that data disappears.""" store = AgentMemoryStore redis client, ttl=2 store.save context "short lived", {"value": "ephemeral"} import time time.sleep 3 assert store.load context "short lived" is None These tests run against the exact Redis version and memory constraints that match production. No more “it works on my fake client.” Running real Redis in CI often raises concerns about Docker/socket access. We added the CI configuration to ensure smooth execution: .github/workflows/agent memory tests.yml excerpt jobs: test: runs-on: ubuntu-latest services: Optional fallback if testcontainers can't access Docker socket redis: image: redis:7-alpine ports: - 6379:6379 steps: - uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: python-version: "3.11" - run: pip install -r requirements-test.txt - name: Run memory integration tests run: pytest tests/ --real-redis We added the --real-redis custom marker so that unit tests relying on mocks and integration tests requiring the container can coexist. The marker also skips these tests automatically when no Redis is available e.g., a local dev environment without Docker . python conftest.py continued def pytest addoption parser : parser.addoption "--real-redis", action="store true", default=False, help="run tests against real Redis" def pytest configure config : config.addinivalue line "markers", "real redis: mark test as requiring real Redis" def pytest collection modifyitems config, items : if not config.getoption "--real-redis" : skip real = pytest.mark.skip reason="need --real-redis option to run" for item in items: if "real redis" in item.keywords: item.add marker skip real This gives us two layers of safety: CI always runs the real-Redis suite, while local developers can quickly iterate without Docker when they choose. After rolling out real Redis tests, we tracked the agent memory module’s bug escape rate for three months. The stability improvement went beyond metrics: the on-call team stopped being woken up at 2 AM for memory loss bugs. Throughout the migration, we hit several pitfalls worth sharing: flushdb , concurrent test runs pytest-xdist will collide. Use unique key prefixes or separate DBs. scope="session" to keep total test time under 10 seconds. max connections higher and adding close logic in teardown fixed it — and caught a production leak we didn’t know we had. redis:latest and missed behavior changes when production was still on Redis 6. Now we explicitly pin redis:7-alpine .Swapping mocks for real Redis isn’t just about testing — it’s about trust. For stateful agents, the distance between “tests pass” and “it works in production” is measured by how closely your test environment mirrors reality. In our case, that mirror was a Docker container, and it saved our sleep.