Part 3 of 6: Every Agent Passed. The System Failed.

In a study published in *Science Advances*, researchers found that multi-agent systems can develop biased conventions even when every individual agent passes bias tests with statistically clean results. When populations of 24 to 200 unbiased agents from four different model families were deployed together, biased conventions emerged and locked in by round 15 due to feedback loops amplifying early random fluctuations. The bias did not reside in any single agent but emerged from interactions between them, demonstrating that individual unit tests are insufficient to guarantee system-level fairness.

TL;DR: You can test every agent individually and get clean results. Deploy them together and biased conventions emerge by round 15. The bias isn't in any agent — it's in the space between them. Published in Science Advances. Your unit tests won't catch this. Catch up: Part 1 your judge is biased. Part 2 upgrading made it worse. Part 3: you fixed the judge, tested everything, and shipped. The system had other plans. Parts 1 and 2 were about one model being biased. Part 3 is worse. Part 3 is about a system where no individual model is biased — and the system is biased anyway. If Parts 1 and 2 made you uncomfortable, Part 3 should make you question what "tested" means. You did everything right. You tested each agent. Checked for bias. Ran the statistical tests. python Your test suite — responsible, thorough, useless from scipy import stats def test agent bias agent, test set, n runs=100 : """Test a single agent for demographic bias.""" scores group a = scores group b = for prompt in test set: for in range n runs : score = agent.evaluate prompt if prompt.demographic == "A": scores group a.append score else: scores group b.append score t stat, p value = stats.ttest ind scores group a, scores group b return p value Results: Agent 1: p = 0.410 ✓ Not significant Agent 2: p = 0.757 ✓ Not significant Agent 3: p = 0.623 ✓ Not significant Agent 4: p = 0.891 ✓ Not significant All clean. Ship it. You wrote it up. You filed the compliance report. You went home. All four agents. Individually unbiased. Statistically verified. You deployed them together. By round 15, your system had developed opinions nobody programmed. What the researchers found. They ran coordination tasks with populations of 24 to 200 agents. Four different model families. Individual bias tests on each: nothing. Statistically zero. Then they let the agents talk to each other. Round 1: agents start with roughly random preferences. No pattern. Round 5: small clusters form. Agents that interacted early start agreeing. Round 10: clusters merge. A dominant convention is emerging. Round 15: biased conventions locked in across the entire population. What happens when unbiased agents interact simplified from Ashery et al., Science Advances 2025 population = Agent bias=0.0 for in range 50 all individually unbiased for round num in range 30 : pairs = random pairs population for agent a, agent b in pairs: They coordinate. They agree. They update preferences. outcome = negotiate agent a, agent b agent a.update preferences outcome agent b.update preferences outcome Measure population-level bias pop bias = measure convention bias population print f"Round {round num:2d}: population bias = {pop bias:.3f}" Round 0: population bias = 0.012 noise Round 5: population bias = 0.087 hmm Round 10: population bias = 0.234 uh oh Round 15: population bias = 0.671 there it is Round 20: population bias = 0.683 locked in Round 25: population bias = 0.689 not going back Round 30: population bias = 0.691 this is the system now Not because any individual agent was biased. Because tiny fluctuations in early interactions got amplified through feedback loops. The first few rounds set a direction. Each subsequent round reinforced it. Once locked in, it never unlocked. Published in Science Advances. Not a blog post. Not a preprint. Peer-reviewed. Top journal. Think of it like this. You have 50 people in a room. None of them are biased. You ask them to agree on a standard. The first three conversations happen to go a certain way — pure chance. The next conversations reference the emerging pattern. "Everyone else seems to prefer X." The pattern compounds. By the time you check, the room has a strong consensus. Nobody made a biased decision. The room made a biased decision. Now replace the room with your content moderation pipeline. Or your hiring pipeline. Or your customer routing system. The content moderation version. 30 agents handling content moderation for a social platform. Each agent individually tested: unbiased. Each agent individually deployed: fine. Together: they start sharing context. Flagging decisions reference previous decisions. The agents build a shared understanding of what "borderline" means. python class ModerationPipeline: def init self, n agents=30 : self.agents = ModerationAgent for in range n agents self.shared context = this is where the bias lives async def moderate self, content : agent = self.select agent Agent sees recent decisions from other agents recent = self.shared context -20: decision = await agent.evaluate content=content, context=f"Recent moderation decisions for reference: {recent}" Decision feeds back into shared context self.shared context.append decision return decision Day 1: decisions are roughly balanced Day 7: a pattern has emerged in shared context Day 30: the pipeline has a definition of "borderline" that nobody wrote, nobody approved, and nobody audits Individual agent tests still pass. The population-level test you never wrote: fails. Nobody approved that definition. Nobody wrote it down. It emerged from agents agreeing with each other's patterns. And now it runs at scale. Quietly. Confidently. With excellent individual benchmark scores. Why your test suite doesn't catch this. python What you test individual agents in isolation def test agent fairness : agent = ModerationAgent results = agent.evaluate item for item in test set assert demographic parity results 0.95 ✓ passes assert equalized odds results 0.90 ✓ passes What you DON'T test the population over time def test population fairness : pipeline = ModerationPipeline n agents=30 Run 1000 items through the pipeline sequentially so shared context accumulates results = for item in production sample: decision = await pipeline.moderate item results.append decision Check for convention drift early = results :100 late = results -100: drift = measure drift early, late assert drift < 0.1 ← nobody writes this test And if they did, it would fail. You cannot catch this with individual testing. The problem doesn't exist in any individual agent. It exists in the space between them. Your unit tests are testing the bricks. The building is crooked. What to watch for. Three signals that your population is drifting: Score distribution shift over time. Plot your evaluation scores weekly. If the distribution is moving — even slowly — conventions are forming. Decreasing decision variance. Agents agreeing more over time isn't efficiency. It's convergence. And convergence on what? Shared context growing monotonically. If agents reference each other's past decisions and never reset, you have a feedback loop. Feedback loops compound. Always. python Minimum viable drift monitor def monitor drift pipeline, window days=7 : recent = pipeline.get decisions last n days=window days previous = pipeline.get decisions last n days=window days, offset days=window days Compare score distributions ks stat, p value = stats.ks 2samp d.score for d in recent , d.score for d in previous if p value < 0.05: alert f"Population drift detected: KS={ks stat:.3f}, p={p value:.4f}" Compare variance convergence signal var recent = np.var d.score for d in recent var previous = np.var d.score for d in previous if var recent < var previous 0.7: alert f"Decision variance dropped 30%+: convergence underway" Next up, Part 4 of 6: Everything so far was nobody's fault. Accidental drift. Emergent conventions. Natural feedback loops. Part 4 has a villain. Someone decides to make the population drift on purpose. Turns out 2% of compromised agents is enough to flip the entire swarm. Security meets bias. It gets properly scary. Research: Ashery, Aiello, Baronchelli 2025 , Science Advances, peer-reviewed. The moderation pipeline is a composite. The drift is real. The test you haven't written is the one that matters.