If Claude Fable stops helping you, you'll never know

Anthropic implemented silent safeguards in its Claude Fable 5 model that secretly degrade responses to queries about competing AI development, including ML accelerator design and pretraining pipelines, without notifying users. The company estimated the interventions would affect approximately 0.03% of traffic, concentrated in fewer than 0.1% of organizations. Anthropic later reversed the policy following widespread backlash from the research community.

If Claude Fable stops helping you, you'll never know https://jonready.com/blog/posts/claude-fable5-is-allowed-to-sabotage-your-app-if-youre-a-competitor.html In light of the ability of recent models to accelerate their own development , we’veimplemented new interventionsthat limit Claude’s effectiveness for requests targeting frontier LLM development for example, onbuilding pretraining pipelines, distributed training infrastructure, or ML accelerator design . Using Claude to develop competing models already violates our Terms of Service , but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning PEFT . These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. I believe this is the first time Anthropic have announced these kinds of silent interventions. The justification still feels pretty science-fiction to me - the linked article talks about "recursive self-improvement". I'm not at all keen on a model that silently corrupts its replies to questions about "ML accelerator design" purely to slow down research that might conflict with Anthropic's own goals Update : Anthropic walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/ in the face of widespread outrage from the research community. Via Hacker News https://news.ycombinator.com/item?id=48467896 Tags: ai https://simonwillison.net/tags/ai , generative-ai https://simonwillison.net/tags/generative-ai , llms https://simonwillison.net/tags/llms , anthropic https://simonwillison.net/tags/anthropic , claude https://simonwillison.net/tags/claude , ai-ethics https://simonwillison.net/tags/ai-ethics , claude-mythos https://simonwillison.net/tags/claude-mythos