{"slug": "from-mock-to-real-redis-cutting-agent-memory-test-leakage-from-30-to-0", "title": "From Mock to Real Redis: Cutting Agent Memory Test Leakage from 30% to 0", "summary": "A developer's AI customer service agent suffered a 30% memory leak rate in production after mock tests failed to simulate a Redis `ConnectionError` during a network hiccup, causing the agent to lose all context. The team rebuilt automated verification using pytest with a real Redis instance via testcontainers, eliminating online memory-related bugs entirely. The solution uses session-scoped Redis containers with isolated per-test namespaces to catch real-world failures that mocks miss.", "body_md": "Woken up by PagerDuty at 2 AM. The user group was on fire — our AI customer service agent suddenly lost its memory. One message confirmed the user's phone number, the next asked \"How may I address you?\" Checking the logs revealed that the Redis connection pool threw a `ConnectionError`\n\nduring a network hiccup. Our supposedly bulletproof mock tests had never simulated that exception. The code simply skipped memory persistence, and all context was lost. Even scarier: the regression suite was green.\n\nThis is a textbook disaster of having mocks without real-middleware tests. We spent two weeks rebuilding the automated verification of the agent’s memory module with pytest + a real Redis instance. The result: online memory-related bugs went from a 30% leakage rate to zero. Here’s the full blueprint, code, and the sharp edges we found along the way.\n\nAn agent’s memory isn’t simple key-value storage. It handles three things:\n\nA typical mock test patches `redis.Redis`\n\nwith `unittest.mock.patch`\n\nor builds a fake client. It’s easy to reach 80% coverage this way. But the real world doesn’t offer “always-succeeding storage”:\n\n`redis.exceptions.ConnectionError`\n\n. The mock version just returns `True`\n\n.`allkeys-lru`\n\n) silently drop keys when memory gets tight. Your agent’s notepad vanishes. A mock only returns `None`\n\nwhen you explicitly set it.The root cause: **a mock simulates the Redis you imagine, not real Redis.** Unit tests verify logic branches but can’t expose process boundaries, network boundaries, or data consistency issues. For a module like agent memory that depends heavily on external state, integration tests must run against real Redis. Otherwise you’ll eventually pay the technical debt in production.\n\nWe considered three paths for testing with real Redis:\n\nWe chose **pytest + redis-py + testcontainers-python** (with a fallback to docker-compose). The reason: testcontainers lets you declare a Redis container right in `conftest`\n\n, automatically waits for the port to be ready, and destroys it when tests end — no extra scripting. Combined with a `scope=\"session\"`\n\nfixture to share the container across tests, each test function uses an isolated Redis namespace (a prefix or a dedicated db number) to avoid cross-contamination while staying realistic.\n\nWhy not just run tests against a real production Redis on a dedicated db number? If you accidentally misconfigure something and `flushdb`\n\nisn’t blocked, you’ll have another horror story to tell.\n\nHere are three key code blocks: the container-management fixture, the memory storage implementation, and the test cases. You should be able to drop them into your project and run.\n\nThis solves the “how to get a clean Redis instance for tests” problem. We use `testcontainers`\n\nto launch Redis 7 and add a manual `wait_for`\n\nto be absolutely sure the container is ready before handing it over.\n\n``` python\n# conftest.py\nimport pytest\nimport redis\nfrom testcontainers.redis import RedisContainer\n\n@pytest.fixture(scope=\"session\")\ndef redis_container():\n    \"\"\"Session-scoped Redis container – started once per test session\"\"\"\n    container = RedisContainer(\"redis:7-alpine\")\n    container.with_exposed_ports(6379)\n    container.start()\n    # Ensure Redis is truly ready (the built-in wait strategy sometimes isn't enough)\n    client = redis.Redis(\n        host=container.get_container_host_ip(),\n        port=container.get_exposed_port(6379),\n    )\n    client.ping()  # Will fail fast here if not ready\n    yield container\n    container.stop()\n\n@pytest.fixture\ndef redis_client(redis_container):\n    \"\"\"Per-test Redis client with an isolated DB to avoid interference\"\"\"\n    client = redis.Redis(\n        host=redis_container.get_container_host_ip(),\n        port=redis_container.get_exposed_port(6379),\n        db=0,\n        decode_responses=True,  # avoid manual .decode()\n    )\n    client.flushdb()  # Start clean\n    yield client\n    client.close()\n```\n\nWhyThe agent’s memory module mostly stores natural language text. Returning`decode_responses=True`\n\n?`bytes`\n\nforces a`.decode()`\n\neverywhere, adding noise that buries the actual assertions.\n\nThis is a simplified version of production code, living in `memory_store.py`\n\n. It stores session context in a Redis Hash with a TTL. We intentionally kept a serialization path that is easy to break in production (using json instead of pickle, but validating data integrity).\n\n``` python\n# memory_store.py\nimport json\nimport redis\nfrom typing import Dict, Optional\nimport logging\n\nlogger = logging.getLogger(__name__)\n\nclass AgentMemoryStore:\n    def __init__(self, redis_client: redis.Redis, ttl: int = 3600):\n        self.redis = redis_client\n        self.ttl = ttl\n\n    def save_context(self, session_id: str, context: Dict) -> bool:\n        try:\n            key = f\"agent:session:{session_id}\"\n            serialized = json.dumps(context, default=str)\n            self.redis.setex(key, self.ttl, serialized)\n            return True\n        except (redis.exceptions.ConnectionError, TypeError) as e:\n            logger.error(f\"Failed to save context for {session_id}: {e}\")\n            return False\n\n    def load_context(self, session_id: str) -> Optional[Dict]:\n        try:\n            key = f\"agent:session:{session_id}\"\n            data = self.redis.get(key)\n            if data is None:\n                return None\n            return json.loads(data)\n        except (redis.exceptions.ConnectionError, json.JSONDecodeError) as e:\n            logger.error(f\"Failed to load context for {session_id}: {e}\")\n            return None\n```\n\nThis tests the happy path and the scenarios that mock tests never touch: connection failures, serialization errors, and Redis eviction behavior.\n\n``` python\n# test_memory_store_real.py\nimport json\nimport pytest\nfrom unittest.mock import patch\nimport redis\nfrom memory_store import AgentMemoryStore\n\ndef test_save_and_load_context_success(redis_client):\n    store = AgentMemoryStore(redis_client)\n    context = {\"phone\": \"13800138000\", \"intent\": \"return_order\"}\n    assert store.save_context(\"session_1\", context)\n    loaded = store.load_context(\"session_1\")\n    assert loaded == context\n\ndef test_context_not_found(redis_client):\n    store = AgentMemoryStore(redis_client)\n    assert store.load_context(\"nonexistent_session\") is None\n\ndef test_connection_error_graceful_failure(redis_client, mocker):\n    \"\"\"Simulate a ConnectionError — the store must handle it gracefully.\"\"\"\n    store = AgentMemoryStore(redis_client)\n    # Force connection failure on setex\n    mocker.patch.object(redis_client, 'setex', side_effect=redis.exceptions.ConnectionError(\"boom\"))\n    result = store.save_context(\"session_err\", {\"a\": 1})\n    assert result is False\n\ndef test_serialization_error_handling(redis_client):\n    \"\"\"What happens when we try to save an unserializable object?\"\"\"\n    store = AgentMemoryStore(redis_client)\n    bad_context = {\"fn\": lambda x: x}  # lambda not serializable by json\n    result = store.save_context(\"session_bad\", bad_context)\n    # Should gracefully fail, not throw\n    assert result is False\n\ndef test_eviction_behavior(redis_client):\n    \"\"\"Set a tiny TTL and wait — then check that data disappears.\"\"\"\n    store = AgentMemoryStore(redis_client, ttl=2)\n    store.save_context(\"short_lived\", {\"value\": \"ephemeral\"})\n    import time\n    time.sleep(3)\n    assert store.load_context(\"short_lived\") is None\n```\n\nThese tests run against the exact Redis version and memory constraints that match production. No more “it works on my fake client.”\n\nRunning real Redis in CI often raises concerns about Docker/socket access. We added the CI configuration to ensure smooth execution:\n\n```\n# .github/workflows/agent_memory_tests.yml (excerpt)\njobs:\n  test:\n    runs-on: ubuntu-latest\n    services:\n      # Optional fallback if testcontainers can't access Docker socket\n      redis:\n        image: redis:7-alpine\n        ports:\n          - 6379:6379\n    steps:\n      - uses: actions/checkout@v4\n      - name: Set up Python\n        uses: actions/setup-python@v5\n        with:\n          python-version: \"3.11\"\n      - run: pip install -r requirements-test.txt\n      - name: Run memory integration tests\n        run: pytest tests/ --real-redis\n```\n\nWe added the `--real-redis`\n\ncustom marker so that unit tests relying on mocks and integration tests requiring the container can coexist. The marker also skips these tests automatically when no Redis is available (e.g., a local dev environment without Docker).\n\n``` python\n# conftest.py (continued)\ndef pytest_addoption(parser):\n    parser.addoption(\"--real-redis\", action=\"store_true\", default=False,\n                     help=\"run tests against real Redis\")\n\ndef pytest_configure(config):\n    config.addinivalue_line(\"markers\", \"real_redis: mark test as requiring real Redis\")\n\ndef pytest_collection_modifyitems(config, items):\n    if not config.getoption(\"--real-redis\"):\n        skip_real = pytest.mark.skip(reason=\"need --real-redis option to run\")\n        for item in items:\n            if \"real_redis\" in item.keywords:\n                item.add_marker(skip_real)\n```\n\nThis gives us two layers of safety: CI always runs the real-Redis suite, while local developers can quickly iterate without Docker when they choose.\n\nAfter rolling out real Redis tests, we tracked the agent memory module’s bug escape rate for three months.\n\nThe stability improvement went beyond metrics: the on-call team stopped being woken up at 2 AM for memory loss bugs.\n\nThroughout the migration, we hit several pitfalls worth sharing:\n\n`flushdb()`\n\n, concurrent test runs (pytest-xdist) will collide. Use unique key prefixes or separate DBs.`scope=\"session\"`\n\nto keep total test time under 10 seconds.`max_connections`\n\nhigher and adding close logic in teardown fixed it — and caught a production leak we didn’t know we had.`redis:latest`\n\nand missed behavior changes when production was still on Redis 6. Now we explicitly pin `redis:7-alpine`\n\n.Swapping mocks for real Redis isn’t just about testing — it’s about trust. For stateful agents, the distance between “tests pass” and “it works in production” is measured by how closely your test environment mirrors reality. In our case, that mirror was a Docker container, and it saved our sleep.", "url": "https://wpnews.pro/news/from-mock-to-real-redis-cutting-agent-memory-test-leakage-from-30-to-0", "canonical_source": "https://dev.to/_eb7f2a654e97a60ae9f96e/from-mock-to-real-redis-cutting-agent-memory-test-leakage-from-30-to-0-4lca", "published_at": "2026-06-06 01:07:23+00:00", "updated_at": "2026-06-06 01:42:59.200850+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "mlops", "ai-tools", "artificial-intelligence"], "entities": ["Redis", "PagerDuty", "pytest"], "alternates": {"html": "https://wpnews.pro/news/from-mock-to-real-redis-cutting-agent-memory-test-leakage-from-30-to-0", "markdown": "https://wpnews.pro/news/from-mock-to-real-redis-cutting-agent-memory-test-leakage-from-30-to-0.md", "text": "https://wpnews.pro/news/from-mock-to-real-redis-cutting-agent-memory-test-leakage-from-30-to-0.txt", "jsonld": "https://wpnews.pro/news/from-mock-to-real-redis-cutting-agent-memory-test-leakage-from-30-to-0.jsonld"}}