{"slug": "when-helping-hurts-and-how-to-fix-it-multi-agent-debate-for-data-cleaning", "title": "When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning", "summary": "A new study of multi-agent debate for data cleaning found that the process degrades generative performance across four model families by 1.6 to 15.5 percentage points due to critique-induced confusion, yet improves error detection by 27.4 points in F1 score. Researchers identified a debate benefit condition based on the probability of rescuing wrong outputs versus destroying correct ones, and demonstrated that adversarial separation with a separate Critic using code-execution grounding and evidence-gated generation produced the first debate configuration to significantly exceed single-agent performance on a generative task by 5.3 percentage points.", "body_md": "arXiv:2606.02866v1 Announce Type: new\nAbstract: When does multi-agent debate help data cleaning, and when does it hurt? Across three benchmarks, four model families, and over 6,000 task-condition pairs, we find debate's effect reverses sign: it degrades generation across all four models (-1.6 to -15.5pp) through critique-induced confusion (CIC), hallucinated Critic feedback that the Generator accepts uncritically, yet improves error detection (+27.4pp F1, d=1.0). We derive a debate benefit condition: debate helps when the probability of rescuing a wrong output (Critic verification odds weighted by fixability) exceeds the probability of destroying a correct one. A factorial experiment proves adversarial separation is essential: self-verification with identical tools fails, while a separate Critic with code-execution grounding and evidence-gated generation produces the first debate configuration to significantly exceed single-agent on a generative task (+5.3pp, p<0.05). The condition correctly predicts all nine task types and generalizes with zero false positives across 19 published comparisons in seven domains.", "url": "https://wpnews.pro/news/when-helping-hurts-and-how-to-fix-it-multi-agent-debate-for-data-cleaning", "canonical_source": "https://arxiv.org/abs/2606.02866", "published_at": "2026-06-03 04:00:00+00:00", "updated_at": "2026-06-03 04:17:30.215263+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "ai-agents", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/when-helping-hurts-and-how-to-fix-it-multi-agent-debate-for-data-cleaning", "markdown": "https://wpnews.pro/news/when-helping-hurts-and-how-to-fix-it-multi-agent-debate-for-data-cleaning.md", "text": "https://wpnews.pro/news/when-helping-hurts-and-how-to-fix-it-multi-agent-debate-for-data-cleaning.txt", "jsonld": "https://wpnews.pro/news/when-helping-hurts-and-how-to-fix-it-multi-agent-debate-for-data-cleaning.jsonld"}}