Leviathan Waking – On Anthropic/USG, and a new era in AI governance

Anthropic released Fable, a commercial AI model with strict guardrails, but a jailbreak bypassed some safeguards, prompting the Trump Administration to demand its de-deployment. The dispute escalated into a broader conflict over AI governance, with the government imposing indefinite restrictions on the model. This marks a new era of tension between AI companies and regulators, as the U.S. shifts from a laissez-faire approach to aggressive oversight.

Leviathan Waking On Anthropic/USG, and a new era in AI governance Introduction Imagine that there were no Food and Drug Administration FDA , but there remained a large pharmaceutical sector, similar in size and scope to the one the United States enjoys today. In this alternate world, imagine that drugs were officially not licensed; there were even officials in the executive branch who boasted that the U.S., unlike other countries, would not get into the regulatory morass of licensing drugs. One day, a pharmaceutical developer warns that they think they have made a drug that cures a major Cancer at one dosage but is lethal at a slightly higher dosage. The company says, for this reason, that they are going to restrict release only to pre-approved patients and monitor their usage of the drug carefully—a sharp break from prior industry practice but one that the company insists, controversially, is necessary. This particular company had been advocating for years for stricter drug regulation, much to the chagrin of the government. This causes a stir, and the government, not quite knowing what to do, announces that it will give drug developers the helpful option to show their drugs’ safety profiles to government officials before they are released. They are adamant that this is a voluntary program. The pharmaceutical company, being hopelessly literal nerds, and if we are being honest, more than a little bit obstinate, decides to release their drug without going through the voluntary program. “We already paused general availability of the drug while we did our own safety study, so we don’t need the government’s testing, and besides it is voluntary , isn’t it?” the company seems to be saying. But then a handful of patients get side effects severe enough to hospitalize them, but not severe enough to be lethal. The government gets understandably upset, particularly considering their lack of experience in regulating drugs. “You talked up your own safety practices so much, and now we have people in the hospital. You are telling us that you are comfortable releasing chemicals that can put people into the hospital?, ” the government argues to the company. The company’s literal and obstinate nerds say, “well, we’ve thought about drug safety regulation quite a bit, and given how common hospitalization of a small number of patients is with a new drug, compared to the lifesaving benefits of our drug for millions, yes, we think the benefits outweigh the risks in this case.” But trust has already broken down, and this abstract, technocratic defense falls on deaf airs. “People are being hospitalized ,” the government says. And so the government bans the drug, indefinitely. It is not clear what the government wants more: a remedy for this specific side effect, a solution to all side effects from drugs, or, really, an apology from the company, as well as the sensation of domination over these disobedient, obstinate, and literal nerds. In a matter of weeks, in our alternative world, the United States went from a system that was implausibly laissez-faire for the level of risk involved in this industry, to a system that was, in the eyes of essentially all expert onlookers, incomprehensibly strict and risk averse. Fable, Jailbreaks, and Export Controls: What Happened This, of course, is my read of what happened in the Trump Administration’s latest dispute with the AI company Anthropic. For those not following the blow-by-blow, what happened, in a few sentences, is: Anthropic released Fable, a commercial version of their very-powerful Mythos model with severe guardrails to prevent misuse. People liked it, though broadly speaking thought the guardrails were far too strict. A few days later, officials in the Trump Administration it is not clear who became aware of a jailbreak that got around some of Fable’s safeguards it is not clear how severely , and demanded that Anthropic de-deploy the model it is not clear with how much specificity the government expressed the concern . Anthropic did not de-deploy the model it is not clear why , so the government imposed worldwide export controls against all non-U.S. persons on Fable and Mythos.Because Anthropic lacks the ability to validate U.S. personhood for end users, this meant they had to pull down the models globally, for everyone. In fact, by some accounts, Anthropic has had to suspend internal usage of their model because of the risk that their own non-U.S. person employees might use the model. You’ll notice the clause “it is not clear” repeated frequently above. The sheer opacity of everything that is unfolding makes it hard to analyze. There is no text for me to draw on, and no actual policy to criticize. There is simply a game of he-said, she-said played between two actors whose animosity toward one another is only growing and who both, if we are honest, seem to be making things worse for themselves and for the whole industry. It is worth dwelling for a moment on how unclear the Trump Administration has been. In the weeks after Anthropic first announced Mythos Preview, David Sacks, the former White House AI Czar and current Vice Chair of the President’s Council of Advisors on Science and Technology, sought to downplay the capabilities of the model by suggesting https://x.com/DavidSacks/status/2043370436847972832?s=20 that Anthropic had a “boy who cried wolf” problem with AI safety claims. Emil Michael, Undersecretary of War for Research and Engineering, argued https://defensescoop.com/2026/05/07/mythos-frontier-ai-models-pentagon-cybersecurity/ —correctly—that AI has a “cyber” problem rather than a “Mythos” problem, meaning that the risks and capabilities of Mythos are not intrinsic to Anthropic models, but something we should expect to see broadly throughout the AI field soon. A few weeks later, Sacks argued that OpenAI’s GPT 5.5 was of a similar capability level to Mythos and applauded OpenAI for making it broadly available. He contrasted OpenAI’s openness—and again I think Sacks is right here—with Anthropic’s more cautious and restrictive approach to releasing Mythos, saying on X https://x.com/DavidSacks/status/2049907993588769006?s=20 , “ GPT 5.5 may be the first cyber model that defenders actually get to use.” About five weeks after that tweet, on June 2, President Trump signed Executive Order 14409 https://www.federalregister.gov/documents/2026/06/05/2026-11415/promoting-advanced-artificial-intelligence-innovation-and-security , “Promoting Advanced Artificial Intelligence Innovation and Security,” section 3 of which describes a voluntary, 30-day pre-deployment testing program for frontier AI, and section 3 c of which reads, in its entirety: c Nothing in this section shall be construed to authorize the creation of a mandatory governmental licensing, preclearance, or permitting requirement for the development, publication, release, or distribution of new AI models, including frontier models. Anthropic had announced the highly restricted release of Mythos Preview nearly 60 days prior to the date this Executive Order was signed, and had surely made the U.S. government aware of Mythos well before even that date. And besides, nothing in the Executive Order was operative yet—the deadline for the creation of the voluntary testing program is not until July. On paper then, given the text of the Administration’s policy and the statements of senior Administration and Administration-adjacent officials , Anthropic should have felt in the clear to release Fable without getting an explicit thumbs up from the U.S. government. Everything the U.S. government was communicating, in policy and in rhetoric, seemed to suggest “go ahead, release your model ” And yet common sense would dictate otherwise. Anthropic is still in the midst of a heated dispute with the Department of War about that agency’s decision to label Anthropic a supply-chain risk. Bitter disputes about policy and politics between the Administration and Anthropic remain unresolved, among them export controls, federal preemption, and the general reality that Anthropic supports Democratic candidates for office while Republicans occupy the seat of power. Of course they needed to tread carefully . What the law says does not matter. What Administration officials argue on one day does not matter. Anthropic is a political enemy of this Administration, in part because they have explicitly chosen to make themselves one . It is simply naïve to think that your company can operate under such circumstances without an extreme degree of regulatory caution. And given this context, Anthropic’s actions are viewed by many within Washington as not simply unwise, but actively antagonistic. And it is not just about Anthropic and political grudge matches with the Trump Administration. Everyone at the frontier should understand that in practice, you do need an explicit green light from the government now. I can’t pretend to be mad about this, even though it does contradict both the rhetoric and the policy of the Administration; after all, my own analysis https://x.com/deanwball/status/2061838747642024009?s=20 of the EO at the time it was signed was that it created a de facto licensing regime. The stark reality is that making superintelligence is a profoundly political act even in the healthiest of societies, to say nothing of the filthily political world we Americans currently inhabit. A model like Mythos goes beyond being a mere political act and implicates the sovereignty of the state itself. No company gets to shake the foundation of state sovereignty while staying blithely above the raw reality of politics. In D.C., Anthropic’s rapid release of Mythos after the supply-chain risk controversy with the Department of War was not just seen as another step in the development of AI, even if that is what it was. It was seen by many as a move against the United States Government—a private company, developing a weapon, as a move against the government . What else, really, could one have expected? All actors in this industry, and all concerned citizens observing the AI field, must steel themselves for a profoundly more political future. What Is To Be Done? The near-term solution to the local dispute between Anthropic and the government is that thing you hear often in D.C. these says: “a deal.” That is not a matter of policy; it is a matter of Anthropic and the U.S. government coming to mutually agreeable terms by which they can live with one another. The medium-term solution to the broader problem—the lack of a coherent frontier AI governance framework—is a technocratic law, passed by Congress and not by Executive fiat, that puts real guardrails on both industry and government. Consider the FDA example we started with. In the real world, when the FDA denies a company’s drug, the FDA itself is bound by laws and procedural rules that constrain its actions. The FDA cannot simply deny a drug application for no reason, with no notice, and with no public transparency. The FDA’s authorizing statute from Congress outlines the specific reasons the FDA may give for denying a drug; the FDA has to explain, in detail and in writing, what is wrong with the company’s drug; there are numerous appeals processes, first within the agency itself and ultimately extending to the judiciary. Now, I am not saying we need an “FDA for AI,” and I am also very much not saying “the FDA is a perfect institution.” Far from it. My argument is instead that technocratic institutions mediate between the raw impulses of political actors and private enterprise. They provide procedure, structure, predictable rules—all things that create “rules of the game” which go beyond the brazenly political. Does politics enter the picture? Of course, often in significant ways. One hope of mine is that there are ways to design institutions that minimize political interference in technocratic matters, which has been the focus of my writing on private governance and independent verification organizations. Politics and Technocracy I return, however, that politics is not a thing to be avoided in frontier AI. It is a problem to be managed, and a force, ultimately, to be channeled healthily. One should not hope so much to eliminate politics as to put political forces toward the ends towards which they can most productively be applied. To see what I mean, consider last week’s Anthropic controversy: the strictness of the guardrails the company imposed on Fable, and in particular the company’s initial decision quickly walked back to create system-level “safeguards” that would silently degrade Fable’s performance on tasks related to “frontier LLM” research and engineering. All the company’s other safeguards involved degradations of performance that were explicit to the user: users who asked about biology, for example, were frequently downgraded to the previous generation model, Claude Opus 4.8. Other times, users would get the same kinds of refusals from Fable that have become familiar. The machine-learning safeguards, however, were different in that a user would think they were getting a helpful, earnest response from Fable, while in fact, at the system level, Anthropic was mangling both the user’s input prompt and the model’s output to create an invisibly degraded answer. This struck, well, almost everyone on X as unfair, myself included. As I mentioned, the company quickly backtracked. But the whole incident caused me to reflect on the role of AI in politics. Unlike the issue of evaluating the severity of a jailbreak, I felt no need, with the silent-degradation issue, for a private governance institution to mediate between political will and private corporations. And the reason, I realized, is because the silent-degradation controversy is intrinsically about what is fair, while evaluating the severity of a jailbreak is a technical judgment. Politics is well-adapted to channel popular intuitions about what is fair intuitions that, to be clear, I do not always agree with into the law. One needn’t conduct any evaluations or audits to conclude that silently rewriting a user’s prompt to sabotage their work is not fair; it is transparently poor corporate conduct. Political processes are at their best when they are channeled toward making decisions of this type. To be clear, it is trivially easy for human intuitions about what is fair to create deeply perverse and ultimately unfair outcomes. The founders believed in representative democracy mediated by republican checks and balances for a reason: they feared the raw will of the majority. But within the structure they laid down, political processes aimed at determining what is fair have, on the whole and over the course of centuries, done a decent if still highly imperfect job. Political processes are not well adapted, however, to make information-dense technical judgments. “What kind of jailbreak is this, what threat models does it enable, and how does it compare to the broader universe of jailbreak?” are a series of technical questions that White House and other Trump Administration officials asked themselves in the past few days, both implicitly and explicitly. But they asked themselves these questions without a lot of time political leaders, being generalists stretched across the entire Federal government, are always pressed for time , without much technical context, and—the coup de grace —with a set of political prejudices about the company to whose model the jailbreak applied. It should not be surprising, given this set of facts, that Administration officials arrived at a decision many observers outside the Administration—including people like me, who are pre-disposed to be sympathetic to its decisions—found perplexing and frustrating. This notion dovetails with the work of Gillian Hadfield, whose writing on regulatory markets both in general https://www.amazon.com/Rules-Flat-World-Invented-Reinvent/dp/0199916527 and with respect to AI in particular https://arxiv.org/pdf/2304.04914 is the primary inspiration for my own work on private governance. Hadfield argues that governance of any complex technology implicates two distinct types of question: first, democratic questions: what kinds of deception by AI developers count as unfair? What tradeoffs between utility, safety, and competition does the polity wish to make? What level of catastrophic risk are we willing to tolerate? Once those broad democratic questions are answered comes the second set of questions: how do we implement these public directives? And here, things rapidly become technical. It is here where the notion of private governance bodies, overseen by public authorities but allowed considerable latitude to develop their own answers to these technical questions, come into play. Key to Hadfield’s idea is that these private bodies would exist in competition with one another, allowing regulation itself to evolve with technology, societal attitudes, and the like. It feels that we are a long way from any kind of outcome like that, but I am reminded that in AI, the political Overton Window moves quickly. What I can predict with confidence, however, is that if we continue to govern frontier AI without a serious overarching framework, we will continue to get chaotic, unpredictable, and value-destructive outcomes. In the absence of clear rules to mediate political impulses, the American effort will not be about how to achieve U.S. global dominance in AI, as the President aspires to, but instead about whether the U.S. government can achieve dominance over U.S. AI. This is a fight that benefits no one.