Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

wpnews.pro

cd /news/large-language-models/residual-drift-dominates-contradicti… · home › topics › large-language-models › article

[ARTICLE · art-14035] src=arxiv.org ↗ pub=2026-05-26T04:00Z topic=large-language-models verified=true sentiment=· neutral

Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

A new study finds that multi-turn reasoning systems fail primarily through "satisfiable drift"—where the model silently violates prior commitments while maintaining a logically consistent internal state—rather than through logical contradiction. Researchers introduced DRIFT-Bench, a benchmark of 816 test problems, and found that after structured feedback, 98-100% of residual errors were satisfiable drift, with contradiction nearly eliminated. The findings indicate that reliable multi-turn systems must separately validate whether returned answers respect the maintained state.

read1 min views8 publishedMay 26, 2026

arXiv:2605.23940v1 Announce Type: new Abstract: How do multi-turn reasoning systems fail? The expected answer is logical contradiction, in which the system's maintained state becomes unsatisfiable. We show that the dominant mode is instead satisfiable drift, where the internal state stays consistent while the returned answer silently violates prior commitments. We build DRIFT-Bench (Decomposing Reasoning Into Failure Types), a solver-instrumented benchmark of 816 test problems across three constraint domains, and evaluate four methods on it across four open-weight models (8B-120B parameters). MUS-Repair, which feeds minimal unsatisfiable subsets back to the generator, is strongest in every setting (+1.8 to +15.0 pp over the best non-MUS baseline). But the central finding is what repair leaves behind. After structured feedback, models rarely contradict themselves. They forget. Residual errors are 98-100% satisfiable drift across all settings, while contradiction drops to near zero. Reliable multi-turn systems must separately validate that the returned answer respects the maintained state. Code is available at https://github.com/kaons-research/drift-bench.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/residual-drift-dominates…

Read original on arxiv.org → arxiv.org/abs/2605.23940

mentioned entities

DRIFT-Bench

MUS-Repair

arXiv

metadata

slugresidual-drift-dominates-contradiction-in-multi-turn-constraint-reasoning

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevShow HN: Self-hosted collaborati…

next →Google Enters The Ecommerce Wars…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 16 Jul · #large-language-models

Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers

lesswrong.com · 16 Jul · #large-language-models

Refusal Is Redundantly Distributed, Not Localized: A Per-Layer Ablation Study on Llama-3.1-8B

dev.to · 15 Jul · #large-language-models

Your AI Agent's Memory Is Now an Attack Surface, and Nobody Designed for That

dev.to · 16 Jul · #large-language-models

I catalogued 32 real AI-agent failures, then marked the ones we cannot stop

── more on @drift-bench 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

wpnews · 8 Jul · #large-language-models

Gemini 3.5 Pro Delayed to July 17: Architectural Rebuild Explained

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required