How we made parallel pytest safe for multi-tenant agent swarms Honeycomb's engineering team redesigned their parallel test execution system to prevent database collisions when multiple pytest invocations run concurrently against the same PostgreSQL host. The fix introduces per-invocation namespace tokens that make schema names unique across independent test runs, solving lock contention and silent data corruption that emerged as AI agents and multiple developers triggered simultaneous test suites. Parallel test runs feel solved right up until the day they stop being solved. For a while our backend test environment looked healthy. We had pytest-xdist , modular fixtures, Factory Boy, a structured conftest.py , separate platform and tenant databases, and a ~/run tests entrypoint that auto-loaded the test environment. One engineer running a focused slice of the suite was fast and stable. A few workers in parallel was routine. Then how we worked changed. We started running more tests concurrently across more contexts: multiple tmux panes, background validation passes, longer fixture-heavy suites, and eventually multiple AI agents firing test invocations seconds apart against the same database host. That is when a structural problem we had been getting away with surfaced: even with xdist in place, two independent pytest invocations could collide on the same PostgreSQL schemas and produce LockNotAvailable , statement timeouts, or worse, silent fall-through writes into shared public . The fix was not “raise the timeout again.” It was treating the test environment as shared infrastructure: per-invocation namespacing, fail-closed cleanup, deterministic connection labeling, and a few subtle bug fixes that mattered more than the headline change. This post is about what broke, why xdist alone does not solve it, and the specific mechanisms we ended up needing. Why xdist isn’t enough pytest-xdist gives you worker parallelism inside one invocation. That is not the same as making multiple independent invocations safe on the same database host. Our original isolation was schema-per-worker: test gw0 test gw1 test gw2 That works fine until two entirely separate ~/run tests calls both spin up gw0 . Both runs then try to drop, recreate, migrate, and seed the same physical schema. The result is structural lock contention that looks like flakiness until you see the pattern. A multi-tenant SaaS test setup typically touches at least two databases: - a platform database for users, orgs, roles, and global control-plane state - one or more tenant databases for the operational data each customer actually works with Once tests add multi-tenant isolation cases or extra tenant DBs, “isolated by worker name” becomes “isolated only when other invocations stay out of the way.” For a small human team that is an occasional flake. For a swarm of AI agents kicking off concurrent runs and inheriting shell state from each other, it is the normal operating envelope. The hard lesson was simple: xdist gives you worker parallelism inside one invocation. It does not automatically make multiple independent invocations safe on the same database host. What we changed Every pytest invocation now gets a namespace token, and every worker composes its schema from namespace + worker id : test {namespace} {worker id} A namespace token from ~/run tests looks like this: p18234t1745178234r3af93d71 That is p