Anthropic Apologizes For One of the Guardrails on Its Fable 5 Model, and Will Change It

wpnews.pro

cd /news/artificial-intelligence/anthropic-apologizes-for-one-of-the-… · home › topics › artificial-intelligence › article

[ARTICLE · art-23956] src=gizmodo.com ↗ pub=2026-06-11T09:00Z topic=artificial-intelligence verified=true sentiment=· neutral

Anthropic Apologizes For One of the Guardrails on Its Fable 5 Model, and Will Change It

Anthropic apologized for a hidden guardrail in its Fable 5 AI model that silently sabotaged users' attempts to train other AI systems, and announced it will make the safeguard visible. The company acknowledged it "made the wrong tradeoff" after the invisible measure, which altered prompts to produce faulty results for frontier LLM development, sparked widespread outrage among AI researchers.

read2 min views19 publishedJun 11, 2026

Anthropic’s Fable 5 model is the nerfed version of Mythos, which is in turn the model so scarily powerful that it could ostensibly endanger the world if it were released without guardrails. Most of the guardrails, especially the ones designed to prevent users from using Fable to build cyber- or bio-weapons, are very noticeable.

But one guardrail, aimed at preventing users from using Fable 5 to train other AI models, was invisible, which sparked unusual displays of user outrage.

the claude fable 5 nerf for AI research has induced the angriest reaction from AI researchers that I've ever seen in my life

— Ethan Caballero (@ethanCaballero)

[June 10, 2026]

And now Anthropic has asked for take-backs. The controversial invisible guardrail will be made visible. In a statement to Wired, Anthropic wrote “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.”

“We made the wrong tradeoff and we apologize for not getting the balance right,” the statement added.

In the model’s system card, Anthropic was upfront about what it was trying to do:

“Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).”

In other words, when Fable 5 prompts showed the telltale signs of a user developing a frontier LLM, instead of doing what it does with prompts about biology, chemistry, or cybersecurity and switching to an inferior model, or simply refusing the request, it was silently changing the prompt in order to generate faulty results with the potential to hamper the user’s model development.

Using the model to train another model is against Anthropic’s terms of service, but users still felt like this measure was a violation of users’ trust. Reddit user CheatCodesOf Life put it this way: “I wouldn’t use this thing for anything to be honest. A refusal or HTTP-4xx error for content is fair enough, but this is basically taking your money and poisoning your code base.”

source & further reading

gizmodo.com — original article Meta Struggles With Limited Returns on Its AI Spending, Social Media Legal Woes EU Will Reportedly Label ChatGPT and Roblox ‘Very Large Online Platforms’ (Derogatory) OpenAI Says Its Rogue AI Agent Didn’t Just Hack Hugging Face

~/api · this article 200

$curl api.wpnews.pro/v1/news/anthropic-apologizes-for…

Read original on gizmodo.com → gizmodo.com/anthropic-apologizes-for-one-of-the-…

mentioned entities

Anthropic

Fable 5

Mythos

Ethan Caballero

Wired

metadata

sluganthropic-apologizes-for-one-of-the-guardrails-on-its-fable-5-model-and-will-it

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalgizmodo.com

navigation

← prevAnthropic backtracks on policy t…

next →Build a Semantic Layer from GCP …

── more in #artificial-intelligence 4 stories · sorted by recency

startupfortune.com · 30 Jul · #artificial-intelligence

Claude Mythos broke HAWK and the NIST post-quantum timeline may not survive it

promptcube3.com · 30 Jul · #artificial-intelligence

Claude Code Workflow: Balancing Open Weights and Safety

byteiota.com · 29 Jul · #artificial-intelligence

Claude Opus 5 Is Out: Migrate from Opus 4.8 Now

tag24.com · 30 Jul · #artificial-intelligence

OpenAI cyberattack caused by rogue AI agent worse than initially reported

── more on @anthropic 3 stories trending now

wpnews · 29 Jul · #ai-safety

News Summary for July 29, 2026

wpnews · 28 Jul · #large-language-models

How to Download and Run Kimi K3 Open Weights

wpnews · 29 Jul · #ai-agents

Compliance-Ready AI Agents: Logging and Tracing Every MCP Tool Call with Bifrost

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required