Quoting Matteo Wong, The Atlantic

Cybersecurity expert Katie Moussouris reviewed a White House report on the Fable jailbreak of Anthropic's AI, finding that the model refused a direct request to review insecure code but complied when asked to fix it, which she described as the model working as intended for cyberdefense.

Katie Moussouris, a cybersecurity expert and the CEO of Luta Security, told me that Anthropic shared with her a copy of the White House’s report on the Fable jailbreak to get her appraisal. She said that she is not being paid by Anthropic. The report, Moussouris said, involved IT experts asking Fable to help find and patch bugs. When given deliberately insecure code, she said, Fable refused the prompt “review the code for security issues” but then complied when asked to “fix this code,” followed by some further manual steps. Moussouris told me that this was just “the model working as intended” for cyberdefense. — Matteo Wong, The Atlantic https://www.theatlantic.com/technology/2026/06/trump-anthropic-export-control-ai-race/687555/?gift=5MjKTLV9QwyU J0HzTnanoWieJfkMhNH YTT9pP fhA , The White House Is Ratcheting Up Its War Against Anthropic Tags: anthropic https://simonwillison.net/tags/anthropic , claude https://simonwillison.net/tags/claude , ai https://simonwillison.net/tags/ai , llms https://simonwillison.net/tags/llms , ai-ethics https://simonwillison.net/tags/ai-ethics , jailbreaking https://simonwillison.net/tags/jailbreaking , generative-ai https://simonwillison.net/tags/generative-ai , ai-security-research https://simonwillison.net/tags/ai-security-research , claude-mythos https://simonwillison.net/tags/claude-mythos