Anthropic's Mythos fight turns on the hacker who first told it to slow down

wpnews.pro

Anthropic founders Dario Amodei and Daniela Amodei built Claude around the premise that frontier AI companies should earn trust before shipping the most capable systems. Their latest test is Nicholas Carlini, the 35-year-old Anthropic researcher who warned colleagues in March that Mythos was too capable to release and is now part of the argument for letting some version of that capability out under controls, according to a Wall Street Journal profile.

That reversal is not a contradiction so much as the live version of Anthropic's founding bargain. Carlini, described by the Journal as a respected hacker and the AI cybersecurity field's "professional skeptic" of safety claims, spent years breaking claims that machine-learning systems were robust. In March, he told roughly 700 security researchers in San Francisco that Anthropic's next-generation model, referred to in the Journal as Mythos, had helped him find and exploit bugs in Ghost, the web-publishing software, and Linux, the operating system that underpins much of modern computing.

His conclusion was the line that now hangs over Anthropic's fight with Washington: "It's pretty clear to me that these current models are better vulnerability researchers than I am," Carlini said, according to the Journal. Two days later, he sent colleagues a sharper internal note: "I don't think we should release Mythos yet."

Since then, Anthropic has argued for guarded access rather than an outright ban on capability. The Journal reports that U.S. officials have spent recent days fretting over the power of the next-generation model to potentially wreak havoc on global cybersecurity. That has pushed the conversation past a simple ship/don't-ship decision and into a harder question for every frontier lab: if a model can help find serious software flaws faster than expert humans, is the safer path to withhold it entirely, or to put it in the hands of defenders first?

Carlini is the credibility layer Anthropic needed

Anthropic's public case for Mythos depends heavily on the messenger. Carlini is not a product executive selling a launch. He is the kind of researcher whose reputation comes from puncturing overconfident security claims, not laundering them.

That matters because Anthropic is asking governments and the software industry to accept a difficult claim: a model that can produce offensive cyber capability should be deployed, selectively, because attackers will soon have similar tools anyway. In a technical write-up, Anthropic's Frontier Red Team said Mythos Preview was a general-purpose model with striking cybersecurity capability, and said the company launched Project Glasswing to use the model to secure critical software before comparable models become widely available.

Anthropic's own write-up emphasizes risk and restraint: it describes Mythos Preview as unusually strong at cybersecurity tasks and frames that as the reason Anthropic kept it out of general release.

The Journal story adds the missing human sequence. Carlini did not start as the person reassuring officials. He started as the internal brake.

That is what gives Anthropic's argument its force and its vulnerability. If Carlini was right in March, the model was too risky for release. If Anthropic is right in June, the answer is not permanent suppression but controlled deployment to security teams that can harden the code the rest of the economy runs on. The company needs both statements to be true.

Project Glasswing is Anthropic's compromise

Anthropic's compromise is Project Glasswing, a gated program that gives selected security teams access to Claude Mythos Preview for defensive work. Anthropic says the same ability to understand and modify complex software also lets the model help find and fix vulnerabilities. This is not a consumer chatbot launch. It is an attempt to turn a frontier model into infrastructure for a narrow class of institutions that already own high-value code and vulnerability-management workflows.

In an initial Glasswing update, Anthropic said its partners reported large numbers of vulnerabilities after the first month and cautioned that severity ratings can change when maintainers review findings.

Anthropic acknowledges that limitation elsewhere. Its coordinated vulnerability disclosure dashboard describes using an early Mythos Preview snapshot to surface open-source vulnerabilities, then working with external security research firms to triage, validate, and report human-reviewed high- or critical-severity findings. The dashboard notes that maintainers often apply project-specific severity rules that the model does not have at run time.

That is the operational reality behind the policy fight. Model-found vulnerabilities are not automatically usable intelligence. They need validation, disclosure discipline, maintainer trust, patches, and deployment. Mythos does not remove that work. It floods the front end of the pipeline.

The government saw the same capability as a control problem

The administration's concerns emerged as the public debate shifted from narrow safety evaluations to questions of access and control. Security leaders have pushed back, arguing that the same functions that alarm policymakers are common in defensive work. RuntimeWire reported on June 16 that security leaders saw restrictions as punishing the bug-finding work defenders need, and on June 13 that the fight had narrowed around whether certain coding prompts should justify pulling frontier models.

The Carlini profile explains why that debate landed where it did: Anthropic's own internal red team found enough real capability to argue against release before the company built a release strategy around guarded access.

The Amodeis' safety posture is becoming a product constraint

Anthropic was founded in 2021 by former OpenAI employees, including Dario and Daniela Amodei, around the promise that advanced AI could be made more steerable, interpretable, and robust. Mythos turns that promise into a concrete business constraint. If Anthropic treats cyber capability as too dangerous for general release, it limits the addressable market for its most capable systems. If Anthropic releases too broadly and a model is used to accelerate attacks, its safety-first positioning becomes a liability.

That is why Carlini's role matters beyond one model. He is the person Anthropic can point to when it says the risk is real. He is also the person whose March warning complicates any simple story that this is just Washington overreach or just a company overhyping a product.

Anthropic's public materials point toward a middle path: keep the strongest capability gated, productize a safer tier for ordinary teams, and argue that defenders need the curve before attackers get it.

The open question is not whether Carlini was right to worry. Anthropic's own disclosures, the Journal's reporting, and the government's reaction all point in the same direction: frontier coding models have crossed into practical vulnerability research. The open question is whether the Amodeis can turn that fact into a governed deployment model before regulators turn it into a prohibition model.

For Anthropic, Carlini is no longer just the hacker who found the problem. He is the evidence that the company saw the problem early enough to deserve a say in how the answer is built.

source & further reading

runtimewire.com — original article Granola sued over bot-free meeting capture and AI training defaults DeepSeek ships V4 Flash with strong agent scores and low API prices Axis Robotics raises $12 million to turn browser users into robot trainers

Anthropic's Mythos fight turns on the hacker who first told it to slow down

Carlini is the credibility layer Anthropic needed

Project Glasswing is Anthropic's compromise

The government saw the same capability as a control problem

The Amodeis' safety posture is becoming a product constraint

Run your AI side-project on zahid.host