Woken up by PagerDuty at 2 AM. The user group was on fire — our AI customer service agent suddenly lost its memory. One message confirmed the user's phone number, the next asked "How may I address you?" Checking the logs revealed that the Redis connection pool threw a ConnectionError
during a network hiccup. Our supposedly bulletproof mock tests had never simulated that exception. The code simply skipped memory persistence, and all context was lost. Even scarier: the regression suite was green.
This is a textbook disaster of having mocks without real-middleware tests. We spent two weeks rebuilding the automated verification of the agent’s memory module with pytest + a real Redis instance. The result: online memory-related bugs went from a 30% leakage rate to zero. Here’s the full blueprint, code, and the sharp edges we found along the way.
An agent’s memory isn’t simple key-value storage. It handles three things:
A typical mock test patches redis.Redis
with unittest.mock.patch
or builds a fake client. It’s easy to reach 80% coverage this way. But the real world doesn’t offer “always-succeeding storage”:
redis.exceptions.ConnectionError
. The mock version just returns True
.allkeys-lru
) silently drop keys when memory gets tight. Your agent’s notepad vanishes. A mock only returns None
when you explicitly set it.The root cause: a mock simulates the Redis you imagine, not real Redis. Unit tests verify logic branches but can’t expose process boundaries, network boundaries, or data consistency issues. For a module like agent memory that depends heavily on external state, integration tests must run against real Redis. Otherwise you’ll eventually pay the technical debt in production.
We considered three paths for testing with real Redis:
We chose pytest + redis-py + testcontainers-python (with a fallback to docker-compose). The reason: testcontainers lets you declare a Redis container right in conftest
, automatically waits for the port to be ready, and destroys it when tests end — no extra scripting. Combined with a scope="session"
fixture to share the container across tests, each test function uses an isolated Redis namespace (a prefix or a dedicated db number) to avoid cross-contamination while staying realistic.
Why not just run tests against a real production Redis on a dedicated db number? If you accidentally misconfigure something and flushdb
isn’t blocked, you’ll have another horror story to tell.
Here are three key code blocks: the container-management fixture, the memory storage implementation, and the test cases. You should be able to drop them into your project and run.
This solves the “how to get a clean Redis instance for tests” problem. We use testcontainers
to launch Redis 7 and add a manual wait_for
to be absolutely sure the container is ready before handing it over.
import pytest
import redis
from testcontainers.redis import RedisContainer
@pytest.fixture(scope="session")
def redis_container():
"""Session-scoped Redis container – started once per test session"""
container = RedisContainer("redis:7-alpine")
container.with_exposed_ports(6379)
container.start()
client = redis.Redis(
host=container.get_container_host_ip(),
port=container.get_exposed_port(6379),
)
client.ping() # Will fail fast here if not ready
yield container
container.stop()
@pytest.fixture
def redis_client(redis_container):
"""Per-test Redis client with an isolated DB to avoid interference"""
client = redis.Redis(
host=redis_container.get_container_host_ip(),
port=redis_container.get_exposed_port(6379),
db=0,
decode_responses=True, # avoid manual .decode()
)
client.flushdb() # Start clean
yield client
client.close()
WhyThe agent’s memory module mostly stores natural language text. Returningdecode_responses=True
?bytes
forces a.decode()
everywhere, adding noise that buries the actual assertions.
This is a simplified version of production code, living in memory_store.py
. It stores session context in a Redis Hash with a TTL. We intentionally kept a serialization path that is easy to break in production (using json instead of pickle, but validating data integrity).
import json
import redis
from typing import Dict, Optional
import logging
logger = logging.getLogger(__name__)
class AgentMemoryStore:
def __init__(self, redis_client: redis.Redis, ttl: int = 3600):
self.redis = redis_client
self.ttl = ttl
def save_context(self, session_id: str, context: Dict) -> bool:
try:
key = f"agent:session:{session_id}"
serialized = json.dumps(context, default=str)
self.redis.setex(key, self.ttl, serialized)
return True
except (redis.exceptions.ConnectionError, TypeError) as e:
logger.error(f"Failed to save context for {session_id}: {e}")
return False
def load_context(self, session_id: str) -> Optional[Dict]:
try:
key = f"agent:session:{session_id}"
data = self.redis.get(key)
if data is None:
return None
return json.loads(data)
except (redis.exceptions.ConnectionError, json.JSONDecodeError) as e:
logger.error(f"Failed to load context for {session_id}: {e}")
return None
This tests the happy path and the scenarios that mock tests never touch: connection failures, serialization errors, and Redis eviction behavior.
import json
import pytest
from unittest.mock import patch
import redis
from memory_store import AgentMemoryStore
def test_save_and_load_context_success(redis_client):
store = AgentMemoryStore(redis_client)
context = {"phone": "13800138000", "intent": "return_order"}
assert store.save_context("session_1", context)
loaded = store.load_context("session_1")
assert loaded == context
def test_context_not_found(redis_client):
store = AgentMemoryStore(redis_client)
assert store.load_context("nonexistent_session") is None
def test_connection_error_graceful_failure(redis_client, mocker):
"""Simulate a ConnectionError — the store must handle it gracefully."""
store = AgentMemoryStore(redis_client)
mocker.patch.object(redis_client, 'setex', side_effect=redis.exceptions.ConnectionError("boom"))
result = store.save_context("session_err", {"a": 1})
assert result is False
def test_serialization_error_handling(redis_client):
"""What happens when we try to save an unserializable object?"""
store = AgentMemoryStore(redis_client)
bad_context = {"fn": lambda x: x} # lambda not serializable by json
result = store.save_context("session_bad", bad_context)
assert result is False
def test_eviction_behavior(redis_client):
"""Set a tiny TTL and wait — then check that data disappears."""
store = AgentMemoryStore(redis_client, ttl=2)
store.save_context("short_lived", {"value": "ephemeral"})
import time
time.sleep(3)
assert store.load_context("short_lived") is None
These tests run against the exact Redis version and memory constraints that match production. No more “it works on my fake client.”
Running real Redis in CI often raises concerns about Docker/socket access. We added the CI configuration to ensure smooth execution:
jobs:
test:
runs-on: ubuntu-latest
services:
redis:
image: redis:7-alpine
ports:
- 6379:6379
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r requirements-test.txt
- name: Run memory integration tests
run: pytest tests/ --real-redis
We added the --real-redis
custom marker so that unit tests relying on mocks and integration tests requiring the container can coexist. The marker also skips these tests automatically when no Redis is available (e.g., a local dev environment without Docker).
def pytest_addoption(parser):
parser.addoption("--real-redis", action="store_true", default=False,
help="run tests against real Redis")
def pytest_configure(config):
config.addinivalue_line("markers", "real_redis: mark test as requiring real Redis")
def pytest_collection_modifyitems(config, items):
if not config.getoption("--real-redis"):
skip_real = pytest.mark.skip(reason="need --real-redis option to run")
for item in items:
if "real_redis" in item.keywords:
item.add_marker(skip_real)
This gives us two layers of safety: CI always runs the real-Redis suite, while local developers can quickly iterate without Docker when they choose.
After rolling out real Redis tests, we tracked the agent memory module’s bug escape rate for three months.
The stability improvement went beyond metrics: the on-call team stopped being woken up at 2 AM for memory loss bugs.
Throughout the migration, we hit several pitfalls worth sharing:
flushdb()
, concurrent test runs (pytest-xdist) will collide. Use unique key prefixes or separate DBs.scope="session"
to keep total test time under 10 seconds.max_connections
higher and adding close logic in teardown fixed it — and caught a production leak we didn’t know we had.redis:latest
and missed behavior changes when production was still on Redis 6. Now we explicitly pin redis:7-alpine
.Swapping mocks for real Redis isn’t just about testing — it’s about trust. For stateful agents, the distance between “tests pass” and “it works in production” is measured by how closely your test environment mirrors reality. In our case, that mirror was a Docker container, and it saved our sleep.