From Mock to Real Redis: Cutting Agent Memory Test Leakage from 30% to 0

wpnews.pro

Woken up by PagerDuty at 2 AM. The user group was on fire — our AI customer service agent suddenly lost its memory. One message confirmed the user's phone number, the next asked "How may I address you?" Checking the logs revealed that the Redis connection pool threw a ConnectionError

during a network hiccup. Our supposedly bulletproof mock tests had never simulated that exception. The code simply skipped memory persistence, and all context was lost. Even scarier: the regression suite was green.

This is a textbook disaster of having mocks without real-middleware tests. We spent two weeks rebuilding the automated verification of the agent’s memory module with pytest + a real Redis instance. The result: online memory-related bugs went from a 30% leakage rate to zero. Here’s the full blueprint, code, and the sharp edges we found along the way.

An agent’s memory isn’t simple key-value storage. It handles three things:

A typical mock test patches redis.Redis

with unittest.mock.patch

or builds a fake client. It’s easy to reach 80% coverage this way. But the real world doesn’t offer “always-succeeding storage”:

redis.exceptions.ConnectionError

. The mock version just returns True

.allkeys-lru

) silently drop keys when memory gets tight. Your agent’s notepad vanishes. A mock only returns None

when you explicitly set it.The root cause: a mock simulates the Redis you imagine, not real Redis. Unit tests verify logic branches but can’t expose process boundaries, network boundaries, or data consistency issues. For a module like agent memory that depends heavily on external state, integration tests must run against real Redis. Otherwise you’ll eventually pay the technical debt in production.

We considered three paths for testing with real Redis:

We chose pytest + redis-py + testcontainers-python (with a fallback to docker-compose). The reason: testcontainers lets you declare a Redis container right in conftest

, automatically waits for the port to be ready, and destroys it when tests end — no extra scripting. Combined with a scope="session"

fixture to share the container across tests, each test function uses an isolated Redis namespace (a prefix or a dedicated db number) to avoid cross-contamination while staying realistic.

Why not just run tests against a real production Redis on a dedicated db number? If you accidentally misconfigure something and flushdb

isn’t blocked, you’ll have another horror story to tell.

Here are three key code blocks: the container-management fixture, the memory storage implementation, and the test cases. You should be able to drop them into your project and run.

This solves the “how to get a clean Redis instance for tests” problem. We use testcontainers

to launch Redis 7 and add a manual wait_for

to be absolutely sure the container is ready before handing it over.

import pytest
import redis
from testcontainers.redis import RedisContainer

@pytest.fixture(scope="session")
def redis_container():
    """Session-scoped Redis container – started once per test session"""
    container = RedisContainer("redis:7-alpine")
    container.with_exposed_ports(6379)
    container.start()
    client = redis.Redis(
        host=container.get_container_host_ip(),
        port=container.get_exposed_port(6379),
    )
    client.ping()  # Will fail fast here if not ready
    yield container
    container.stop()

@pytest.fixture
def redis_client(redis_container):
    """Per-test Redis client with an isolated DB to avoid interference"""
    client = redis.Redis(
        host=redis_container.get_container_host_ip(),
        port=redis_container.get_exposed_port(6379),
        db=0,
        decode_responses=True,  # avoid manual .decode()
    )
    client.flushdb()  # Start clean
    yield client
    client.close()

WhyThe agent’s memory module mostly stores natural language text. Returningdecode_responses=True

?bytes

forces a.decode()

everywhere, adding noise that buries the actual assertions.

This is a simplified version of production code, living in memory_store.py

. It stores session context in a Redis Hash with a TTL. We intentionally kept a serialization path that is easy to break in production (using json instead of pickle, but validating data integrity).

import json
import redis
from typing import Dict, Optional
import logging

logger = logging.getLogger(__name__)

class AgentMemoryStore:
    def __init__(self, redis_client: redis.Redis, ttl: int = 3600):
        self.redis = redis_client
        self.ttl = ttl

    def save_context(self, session_id: str, context: Dict) -> bool:
        try:
            key = f"agent:session:{session_id}"
            serialized = json.dumps(context, default=str)
            self.redis.setex(key, self.ttl, serialized)
            return True
        except (redis.exceptions.ConnectionError, TypeError) as e:
            logger.error(f"Failed to save context for {session_id}: {e}")
            return False

    def load_context(self, session_id: str) -> Optional[Dict]:
        try:
            key = f"agent:session:{session_id}"
            data = self.redis.get(key)
            if data is None:
                return None
            return json.loads(data)
        except (redis.exceptions.ConnectionError, json.JSONDecodeError) as e:
            logger.error(f"Failed to load context for {session_id}: {e}")
            return None

This tests the happy path and the scenarios that mock tests never touch: connection failures, serialization errors, and Redis eviction behavior.

import json
import pytest
from unittest.mock import patch
import redis
from memory_store import AgentMemoryStore

def test_save_and_load_context_success(redis_client):
    store = AgentMemoryStore(redis_client)
    context = {"phone": "13800138000", "intent": "return_order"}
    assert store.save_context("session_1", context)
    loaded = store.load_context("session_1")
    assert loaded == context

def test_context_not_found(redis_client):
    store = AgentMemoryStore(redis_client)
    assert store.load_context("nonexistent_session") is None

def test_connection_error_graceful_failure(redis_client, mocker):
    """Simulate a ConnectionError — the store must handle it gracefully."""
    store = AgentMemoryStore(redis_client)
    mocker.patch.object(redis_client, 'setex', side_effect=redis.exceptions.ConnectionError("boom"))
    result = store.save_context("session_err", {"a": 1})
    assert result is False

def test_serialization_error_handling(redis_client):
    """What happens when we try to save an unserializable object?"""
    store = AgentMemoryStore(redis_client)
    bad_context = {"fn": lambda x: x}  # lambda not serializable by json
    result = store.save_context("session_bad", bad_context)
    assert result is False

def test_eviction_behavior(redis_client):
    """Set a tiny TTL and wait — then check that data disappears."""
    store = AgentMemoryStore(redis_client, ttl=2)
    store.save_context("short_lived", {"value": "ephemeral"})
    import time
    time.sleep(3)
    assert store.load_context("short_lived") is None

These tests run against the exact Redis version and memory constraints that match production. No more “it works on my fake client.”

Running real Redis in CI often raises concerns about Docker/socket access. We added the CI configuration to ensure smooth execution:

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install -r requirements-test.txt
      - name: Run memory integration tests
        run: pytest tests/ --real-redis

We added the --real-redis

custom marker so that unit tests relying on mocks and integration tests requiring the container can coexist. The marker also skips these tests automatically when no Redis is available (e.g., a local dev environment without Docker).

def pytest_addoption(parser):
    parser.addoption("--real-redis", action="store_true", default=False,
                     help="run tests against real Redis")

def pytest_configure(config):
    config.addinivalue_line("markers", "real_redis: mark test as requiring real Redis")

def pytest_collection_modifyitems(config, items):
    if not config.getoption("--real-redis"):
        skip_real = pytest.mark.skip(reason="need --real-redis option to run")
        for item in items:
            if "real_redis" in item.keywords:
                item.add_marker(skip_real)

This gives us two layers of safety: CI always runs the real-Redis suite, while local developers can quickly iterate without Docker when they choose.

After rolling out real Redis tests, we tracked the agent memory module’s bug escape rate for three months.

The stability improvement went beyond metrics: the on-call team stopped being woken up at 2 AM for memory loss bugs.

Throughout the migration, we hit several pitfalls worth sharing:

flushdb()

, concurrent test runs (pytest-xdist) will collide. Use unique key prefixes or separate DBs.scope="session"

to keep total test time under 10 seconds.max_connections

higher and adding close logic in teardown fixed it — and caught a production leak we didn’t know we had.redis:latest

and missed behavior changes when production was still on Redis 6. Now we explicitly pin redis:7-alpine

.Swapping mocks for real Redis isn’t just about testing — it’s about trust. For stateful agents, the distance between “tests pass” and “it works in production” is measured by how closely your test environment mirrors reality. In our case, that mirror was a Docker container, and it saved our sleep.

source & further reading

dev.to — original article The AI Bubble and the Future of Work: What Professionals Should Prepare For I Built 4 AI Products in 6 Weeks SKILL.md: how to write a Claude Code skill that actually triggers (format + template)

From Mock to Real Redis: Cutting Agent Memory Test Leakage from 30% to 0

Run your AI side-project on zahid.host