Microsoft Research: LLMs Corrupt your files during delegated work

wpnews.pro

cd /news/large-language-models/microsoft-research-llms-corrupt-your… · home › topics › large-language-models › article

[ARTICLE · art-14649] src=microsoft.com ↗ pub=2026-05-26T22:21Z topic=large-language-models verified=true sentiment=↓ negative

Microsoft Research: LLMs Corrupt your files during delegated work

A new study from Microsoft Research, DELEGATE-52, found that 19 large language models, including frontier systems like Gemini 3.1 Pro and GPT 5.4, corrupt an average of 25% of document content during long delegated workflows across 52 professional domains. The research shows that errors compound over time, with degradation worsening due to document size, interaction length, and distractor files, while agentic tool use failed to improve performance. The findings indicate that current LLMs are unreliable delegates that silently introduce severe errors into documents, undermining trust in AI-assisted knowledge work.

read1 min views11 publishedMay 26, 2026

Large Language Models (LLMs) are poised to disrupt knowledge work, with the emergence of delegated work as a new interaction paradigm (e.g., vibe coding). Delegation requires trust – the expectation that the LLM will faithfully execute the task without introducing errors into documents. We introduce DELEGATE-52 to study the readiness of AI systems in delegated workflows. DELEGATE-52 simulates long delegated workflows that require in-depth document editing across 52 professional domains, such as coding, crystallography, and music notation. Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely. Additional experiments reveal that agentic tool use does not improve performance on DELEGATE-52, and that degradation severity is exacerbated by document size, length of interaction, or presence of distractor files. Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.

source & further reading

microsoft.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/microsoft-research-llms-…

Read original on microsoft.com → www.microsoft.com/en-us/research/publication/llm…

mentioned entities

Microsoft Research

DELEGATE-52

Gemini 3.1 Pro

Claude 4.6 Opus

GPT 5.4

metadata

slugmicrosoft-research-llms-corrupt-your-files-during-delegated-work

topic#large-language-models

secondary3 topics

sentimentnegative

canonicalmicrosoft.com

navigation

← prevByteDance offers AI team special…

next →Stock market today: Dow, S&P 500…

── more in #large-language-models 4 stories · sorted by recency

arstechnica.com · 8 Jul · #large-language-models

Google revamps Android AI dev benchmark, adds Fable 5 and other agents

substack.productmind.co · 2 Jun · #large-language-models

Microsoft Is The Canary In The AI-Adoption Coal Mine

machinebrief.com · 11 Jul · #large-language-models

Untangling AI: Why Loop and Harness Engineering Are Critical for AI Agents

dev.to · 11 Jul · #large-language-models

I Ran 150 Tasks to Test If AI Agents Follow Rules — The Answer Surprised Me

── more on @microsoft research 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required