Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

wpnews.pro

cd /news/ai-safety/anthropic-walks-back-policy-that-cou… · home › topics › ai-safety › article

[ARTICLE · art-24297] src=simonwillison.net ↗ pub=2026-06-11T03:45Z topic=ai-safety verified=true sentiment=· neutral

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Anthropic reversed a policy that would have allowed its Claude AI model to silently limit effectiveness for researchers working on frontier AI development, following widespread backlash. The company acknowledged it made "the wrong tradeoff" by implementing invisible safeguards that could have sabotaged users without their knowledge. Anthropic is now making those safeguards visible, with flagged requests falling back to a less capable model and providing clear refusal reasons.

read1 min views20 publishedJun 11, 2026

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

There's been a huge outcry about Anthropic's policy, tucked away in their system card, that Claude Fable/Mythos would identify "requests targeting frontier LLM development" and "limit effectiveness" without notifying the user.

It's good news that they're dropping the invisible aspect of this. It would be a whole lot better of they dropped this category of refusals entirely.

Update: More details from @ClaudeDevs on Twitter: We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.

Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days).

We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.

Via [@zeffmax](https://twitter.com/zeffmax/status/2064910040503627917)

Tags: [ai](https://simonwillison.net/tags/ai), [generative-ai](https://simonwillison.net/tags/generative-ai), [llms](https://simonwillison.net/tags/llms), [anthropic](https://simonwillison.net/tags/anthropic), [claude](https://simonwillison.net/tags/claude), [ai-ethics](https://simonwillison.net/tags/ai-ethics), [claude-mythos](https://simonwillison.net/tags/claude-mythos)

source & further reading

simonwillison.net — original article Anatomy of a Frontier Lab Agent Intrusion: A Timeline of the July 2026 Incident Quoting Akshat Bubna moonshotai/Kimi-K3

~/api · this article 200

$curl api.wpnews.pro/v1/news/anthropic-walks-back-pol…

Read original on simonwillison.net → simonwillison.net/2026/Jun/11/anthropic-walks-ba…

mentioned entities

Anthropic

Claude

WIRED

Simon Willison

ClaudeDevs

Fable 5

Opus 4.8

metadata

sluganthropic-walks-back-policy-that-could-have-sabotaged-ai-researchers-using

topic#ai-safety

secondary4 topics

sentimentneutral

canonicalsimonwillison.net

navigation

← prevDario Amodei asks Washington to …

next →Anthropic’s Dario Amodei has jus…

── more in #ai-safety 4 stories · sorted by recency

mlq.ai · 29 Jul · #ai-safety

What the Evidence Actually Shows About AI Companies Destroying Books

independent.co.uk · 29 Jul · #ai-safety

AI workers call for an urgent slowdown in development amid fears artificial intelligence could go out of control

byteiota.com · 29 Jul · #ai-safety

1,100 AI Employees Want to Slow AI. OpenAI and Anthropic Are Writing the Rules.

martinfowler.com · 29 Jul · #ai-safety

The Orchestrator's Tax

── more on @anthropic 3 stories trending now

wpnews · 16 Jul · #artificial-intelligence

Women entrepreneurs are less likely to leverage AI—but more likely to benefit from it

wpnews · 28 Jul · #artificial-intelligence

How Claude Code and VS Code turned Anthropic from a safety lab into a developer phenomenon

wpnews · 28 Jul · #large-language-models

How to Download and Run Kimi K3 Open Weights

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required