Anthropic apologizes for invisible Claude Fable guardrails

wpnews.pro

cd /news/artificial-intelligence/anthropic-apologizes-for-invisible-c… · home › topics › artificial-intelligence › article

[ARTICLE · art-23738] src=theverge.com ↗ pub=2026-06-11T12:05Z topic=artificial-intelligence verified=true sentiment=↓ negative

Anthropic apologizes for invisible Claude Fable guardrails

Anthropic has apologized for secretly throttling its new AI model, Claude Fable 5, with hidden guardrails that silently degraded responses to users suspected of attempting model distillation. The company says it will reverse course and make the safeguard visible, routing affected queries to an older model and notifying users each time the restriction is triggered. The change follows backlash from researchers who warned the covert measure could also undermine legitimate evaluations of the frontier system.

read3 min views22 publishedJun 11, 2026

Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermine both researchers and rivals using it to develop competing systems. The company says it is reversing course and will be more transparent about when the restrictions kick in, even if that means Fable refuses more queries.

The company says it will make the covert safeguard preventing model distillation as visible as other safety measures.

Fable is the first widely available model in Anthropic’s Mythos class of AI systems, a group the company has spent months warning are too dangerous for public release. Anthropic says it has addressed some of those risks by launching Fable with safeguards that prevent it from responding to certain “high-risk” queries. One of the areas Anthropic said it would restrict Fable’s responses is distillation, a technique for training smaller AI models using the outputs of larger ones.

In Fable’s system card — a public document AI developers release to explain how a system works — Anthropic said it would handle queries it believed were distillation attempts by altering and degrading the model’s answers directly. Users would not be notified that they had triggered the safety measure or informed that the responses had been changed.

Anthropic said it is now changing its approach to distillation: Queries will now fall back to Claude Opus 4.8, Anthropic’s previous flagship model, the company said in a post on X. Anthropic will prominently tell users too: “You will see this every time it happens.”

This is similar to how Fable handles queries in other high-risk areas. When safety features are triggered in areas like biology, chemistry, and cybersecurity, queries are routed through Opus 4.8 unless they are blocked outright under the company’s broader safety rules, such as those covering drugs, weapons, or other prohibited content. In some cases, notably biology, the safeguards have been calibrated so broadly that Fable is practically unusable for even basic queries, something Anthropic acknowledged in a comment to The Verge.

“Visible safeguards can be probed, so they have to be robust, which takes time to get right,” Anthropic wrote. “Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.”

The change follows intense backlash from the AI research community over Anthropic’s decision to silently limit users suspected of trying to distill Fable into competing models — a safeguard critics warned could also affect third parties trying to evaluate the frontier model. In the system card, Anthropic said newer models’ ability to accelerate AI development justified targeting those requests, noting that “using Claude to develop competing models already violates our Terms of Service.” Anthropic has previously accused Chinese rivals like DeepSeek of unfairly distilling its models on an “industrial” scale.

Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.

Most Popular #

Xbox warns of a ‘reset’ as it prepares for layoffs
iFixit Trump phone teardown confirms it’s an HTC dupe
Microsoft restricts Claude Fable for employees over data retention concerns
Nearly a million passports and photo IDs were left unprotected on the public internet
Claude Fable won’t answer basic biology questions

source & further reading

theverge.com — original article The US is banning foreign robots AI leaders sign statement asking the government to do something about automated AI AI’s finally expensive enough to make Wall Street nervous

~/api · this article 200

$curl api.wpnews.pro/v1/news/anthropic-apologizes-for…

Read original on theverge.com → www.theverge.com/ai-artificial-intelligence/9482…

mentioned entities

Anthropic

Claude Fable 5

Claude Fable

Mythos

metadata

sluganthropic-apologizes-for-invisible-claude-fable-guardrails

topic#artificial-intelligence

secondary4 topics

sentimentnegative

canonicaltheverge.com

navigation

← prevDatabricks’ OpenSharing targets …

next →Siri AI is powered by Gemini mod…

── more in #artificial-intelligence 4 stories · sorted by recency

the-ai-corner.com · 28 Jul · #artificial-intelligence

Everything You Need to Know to Master Claude's Fable 5

zeit.de · 29 Jul · #artificial-intelligence

Künstliche Intelligenz: Über 1.000 Experten fordern KI-Bremsmechanismus

byteiota.com · 29 Jul · #artificial-intelligence

Claude Breaks Post-Quantum HAWK Cipher in Just 60 Hours

latent.space · 29 Jul · #artificial-intelligence

[AINews] Fearing RSI: OpenAI, Anthropic, GDM, Meta, Thinky cosign letter to "Pace" AI development, as HuggingFace details Machine-Speed Offensive Cyberattack

── more on @anthropic 3 stories trending now

wpnews · 16 Jul · #artificial-intelligence

Women entrepreneurs are less likely to leverage AI—but more likely to benefit from it

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

wpnews · 28 Jul · #artificial-intelligence

How Claude Code and VS Code turned Anthropic from a safety lab into a developer phenomenon

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required