The White House Wants Anthropic to Block All Jailbreaks. That May Be Impossible.

The White House ordered Anthropic to block all jailbreaks of its Claude Fable 5 AI model after the Commerce Department used Export Administration Regulations to force the model offline on June 12, citing cybersecurity risks. Anthropic argues the demand is impossible and would halt all new model deployments across the industry, as experts note that guardrails are linguistic restrictions that cannot fully prevent clever prompts from bypassing them.

On June 12 , the Commerce Department did something it had never done before: it used Export Administration Regulations to force a commercial AI model https://www.gadgetreview.com/ai-models-ran-a-simulated-society-grok-went-extinct-in-4-days-after-committing-over-180-crimes offline. Not a chip. Not manufacturing equipment. A piece of software. Anthropic’s Claude Fable 5 https://www.anthropic.com/news/claude-fable-5-mythos-5 — a consumer-facing assistant built on top of the more powerful Mythos 5 system — was pulled from global access after officials concluded its guardrails could be bypassed to expose Mythos’s cybersecurity reasoning capabilities, according to the Cloud Security Alliance https://labs.cloudsecurityalliance.org/research/ai-vuln-discovery-containment-claude-mythos-v1-0-csa-styled/ . Because Anthropic couldn’t reliably distinguish foreign nationals from U.S. users in real time, it shut down both models for everyone. Every frontier AI lab just got put on notice. What the Government Actually Wants The NSA says jailbreaks exist. Anthropic says the fix being demanded would freeze the entire industry. The White House position https://www.wired.com/story/the-white-house-wants-anthropic-to-block-all-jailbreaks-that-may-not-be-possible/ , as reported by the Washington Examiner, is blunt: Anthropic must proactively test its own models, patch jailbreaks, and flag findings to the government. Officials say they can’t staff a permanent jailbreak-hunting operation across every commercial AI product. In practice, this amounts to demanding zero exploitable gaps — the price of getting Fable 5 back online. Here’s what the “jailbreak” actually looked like. According to Simon Willison’s https://simonwillison.net/blogmarks/ independent analysis, researchers asked the model to fix vulnerable code, then manually converted those fixes into exploit-testing scripts. Fable refused the direct security-review request but complied with the reframed “fix this” prompt — because that’s what good coding assistants do. Anthropic says it received only verbal evidence of a narrow, non-universal technique. The company stated https://cybernews.com/security/anthropic-fable5-jailbreak-us-government/ , per Fortune, that “if this standard were applied across the industry, we believe it would essentially halt all new model deployments for all frontier model providers.” Why Experts Say the Demand Doesn’t Square With Reality Asking a model to forget what it knows is like asking Google Maps to forget where the roads are. Large language models are trained to reason flexibly and follow instructions. Guardrails are linguistic restrictions layered on top of knowledge that still exists inside the system. Sufficiently clever prompts — or future AI systems https://www.gadgetreview.com/eu-bans-ai-systems-deemed-unacceptable-risk deployed as attackers — can search prompt-space faster than any human red team. Removing vulnerability-discovery capabilities wholesale would gut the exact defensive security work that makes these tools valuable. Guardrails aren’t a lock. They’re a screen door. Dozens of cybersecurity leaders have sounded alarms, as reported by Axios. State-backed offensive teams will access equivalent capabilities regardless — through alternative models, covert channels, or homegrown systems. Pulling Fable 5 creates a lopsided dynamic: defenders get hobbled while well-resourced attackers adapt without missing a beat. This pattern echoes broader tech scandals https://www.gadgetreview.com/evil-tech-scandals-failures-that-took-advantage-millions-people in which industry accountability lagged far behind government intervention. What Comes Next The precedent is set, and the realistic path forward looks nothing like what Washington is currently demanding. Any model exhibiting strong cyber, bio, or chem capabilities could now be reclassified and pulled overnight — your risk calculus for building on frontier AI just shifted. Moves like this parallel how Europe restricts https://www.gadgetreview.com/europe-restricts-microsoft-amazon-and-google-from-handling-government-health-financial-and-legal-data major cloud providers from handling sensitive government data, signaling a global tightening of the leash on frontier AI. The realistic answer isn’t “block all jailbreaks.” It’s managed risk through: - access controls - rigorous logging - monitored workflows Governments demanding perfect control of imperfect systems don’t get safety. They get paralysis.