Anthropic says these topics are too dangerous to let its Fable 5 model talk about

wpnews.pro

cd /news/artificial-intelligence/anthropic-says-these-topics-are-too-… · home › topics › artificial-intelligence › article

[ARTICLE · art-23824] src=arstechnica.com ↗ pub=2026-06-09T19:20Z topic=artificial-intelligence verified=true sentiment=· neutral

Anthropic says these topics are too dangerous to let its Fable 5 model talk about

Anthropic released Claude Fable 5 on Tuesday, its first "Mythos-class" model, but restricted its ability to answer queries on cybersecurity, biology, and chemistry to prevent malicious use. The publicly accessible model redirects sensitive topics to an older Claude Opus 4.8 model, while the underlying Mythos 5 model is reserved for vetted cyberdefenders through Project Glasswing. Anthropic acknowledged the safeguards may occasionally block harmless requests in less than five percent of sessions, but deemed the trade-off necessary to prevent the model from assisting malicious actors in causing serious harm.

read2 min views13 publishedJun 9, 2026

Anthropic Tuesday publicly released Claude Fable 5, its first “Mythos-class” model that it says surpasses its previous frontier Opus models in overall capabilities. But the model’s launch today comes with safeguards designed to prevent it from answering queries on topics like cybersecurity, biology, and chemistry, where the company has publicly worried about its potential impact to “uplift” malicious actors.

Anthropic says Fable 5 operates on the “same underlying model” as Mythos 5, which is coming out of its monthslong “Mythos Preview” period today, but only for “a small group of cyberdefenders” judged trustworthy through the existing Project Glasswing. Unlike Mythos 5, though, the publicly accessible Fable 5 is designed to funnel queries on certain sensitive topics to the earlier Claude Opus 4.8 model and to warn the user when this is happening.

Anthropic said it has tuned these safeguards to be “stricter than ideal,” meaning the system may occasionally refuse “harmless requests” in a way that it acknowledges may be frustrating for regular users. But Anthropic says such false positives come up in less than five percent of all sessions in testing, and were worth it to avoid situations where Mythos could give malicious actors assistance in “causing serious harm that they couldn’t have received from other sources.”

I can’t let you do that, Dave #

Fable 5’s topic-based safeguards are built around a system of classifiers designed to broadly detect banned prompt subjects as well as any potential jailbreak attempts. In over 1,000 hours of red-team testing with a bug bounty program, Anthropic says external teams failed to find any universal jailbreaks for Fable 5. The new model also resisted automated jailbreak attempts to a much larger degree than previous Claude Opus models, Anthropic said.

The company said it is particularly worried about Mythos 5’s ability to perform “agentic hacking,” executing multi-part cyberattacks with much more facility than earlier models. But testing from the UK’s AI Security Institute in recent months found that Mythos Preview performed similarly to OpenAI’s GPT-5.5 on a suite of Capture the Flag challenges, suggesting Mythos’ performance is not “a breakthrough specific to one model.”

source & further reading

arstechnica.com — original article JFrog tries to spin OpenAI 0-day exploit of its app into a success story Despite AI hype, Google's data shows workers aren't automating themselves away Microsoft unveils AI security tools it says outperform competing platforms

~/api · this article 200

$curl api.wpnews.pro/v1/news/anthropic-says-these-top…

Read original on arstechnica.com → arstechnica.com/ai/2026/06/anthropic-says-these-…

mentioned entities

Anthropic

Claude Fable 5

Mythos 5

Claude Opus 4.8

Project Glasswing

Ars Technica

Mozilla

metadata

sluganthropic-says-these-topics-are-too-dangerous-to-let-its-fable-5-model-talk

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalarstechnica.com

navigation

← prevQuoting Andrej Karpathy

next →The AI industry spent years chas…

── more in #artificial-intelligence 4 stories · sorted by recency

the-ai-corner.com · 28 Jul · #artificial-intelligence

Everything You Need to Know to Master Claude's Fable 5

startupfortune.com · 28 Jul · #artificial-intelligence

Over a thousand AI employees ask the US government to slow down their own industry

ibtimes.co.uk · 28 Jul · #artificial-intelligence

Inside Project Panama, Anthropic's Secret Effort To Scan and Shred the World's Books

lifehacker.com · 28 Jul · #artificial-intelligence

Claude may have leaked your chats to the public

── more on @anthropic 3 stories trending now

wpnews · 26 Jul · #artificial-intelligence

Nobel laureate Simon Johnson on the AI race and China’s ‘over-automation’ problem

wpnews · 26 Jul · #artificial-intelligence

China’s Moonshot, Z.AI, and DeepSeek are challenging U.S. AI labs—and beating them on cost

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required