# Thoughts on Governing Catastrophic AI Misuse Risk

> Source: <https://forum.effectivealtruism.org/posts/meBG6ZgYqySr5TcNG/thoughts-on-governing-catastrophic-ai-misuse-risk>
> Published: 2026-07-04 12:35:51+00:00

**Context: **I wrote this post mainly to help me clarify my own thoughts on the topic. I don’t think the concerns I raise here are particularly novel (they should be discussed more though).

Frontier AI models may soon provide meaningful uplift to actors seeking to cause catastrophic [1] harm.

A worst-case outcome here would be if an AI model were to meaningfully aid someone in the creation of a deadly bioweapon in the near-term. As AI models advance, they may help erode historical barriers to such attacks (e.g. [expertise and tacit knowledge](https://arxiv.org/html/2602.23329v2#:~:text=One%20way%20LLMs%20could%20change%20the%20risk%20landscape%20is%20by%20providing%20expert%2Dlevel%20support%20on%20tasks%20that%20historically%20required%20expert%20assistance.)), as well as potentially “[raise the ceiling of harm](https://arxiv.org/pdf/2306.13952)” if one occurs.

Another misuse risk – AI assisting with/orchestrating [cyber-attacks](https://www.anthropic.com/glasswing#:~:text=Mythos%20Preview%20found%20a,by%20connecting%20to%20it%3B) – has received a lot of attention recently. Though I’m currently far more confused (as many others seem to be) about just how catastrophic the worst forms of cyber misuse could be, so I will try to focus only on bio-risk in this post.[[2]](https://forum.effectivealtruism.org/feed.xml#fndgwz50jaazg)

As I understand it, most public evidence supporting AI bio-uplift potential comes from three sources:

Rapid improvement on relevant technical tasks and subject-matter expertise[[3]](https://forum.effectivealtruism.org/feed.xml#fnmief0d2ta5)

Concrete cases of models exceeding expert baselines in narrow domains[[4]](https://forum.effectivealtruism.org/feed.xml#fnh8zu140ct5r)

Randomised Controlled Trials (RCTs) measuring AI uplift in more realistic settings[[5]](https://forum.effectivealtruism.org/feed.xml#fnq6qnwmgastf)

None of this evidence seems completely decisive (see footnotes for more details). This uncertainty isn’t reassuring though. The downside risk is extremely large, and AI capabilities are, as always, advancing incredibly quickly.

Given all this, it’s clear that we need to act as soon as possible when it comes to reducing misuse risk. So how might this aim be accomplished? There seem to be three main pathways:

This post is mostly about one aspect of governance that seems especially important and also especially risky to me: attempting to reduce misuse risk by explicitly governing who can access AI models.

Could this end up inadvertently causing harm via increasing centralisation and decreasing resilience? It seems very possible if you take the concept to an extreme (blanket or permanent restrictions). Less stringent access control approaches (such as defender-first access) seem desirable and effective to me.

*(To be clear, “model access controls,” can refer to several different things. E.g. restricting API access to a deployed model, restricting who can commercially deploy a model, and restricting release of model weights. These are not equivalent -- for example open-weight release is more irreversible than API access because many post-deployment safety levers disappear once weights are public).*

What might a “minimum-viable” strategy look like for preventing bio-misuse? As many have already proposed, it would start with the interventions that directly target the physical pathway to harm.

For example:

This seems good for increasing [societal resilience](https://airesilience.net/) and reducing the [centralisation of power](https://governingtransformativeai.substack.com/i/204284328/ai-enabled-acquisition-of-power).

Governing physical chokepoints also has more specific benefits. Its efficacy plausibly depends much less on whether models are open or closed, how capable frontier models become, or how robust model safeguards are. If the dangerous step still requires DNA synthesis, specialised equipment, and controlled materials, those points can be governed despite model-level controls being leaky.

Minimum-viable approaches are also great for transparency into frontier AI development. [As Eli Lifland points out,](https://aifuturesnotes.substack.com/p/beware-delaying-public-deployment) delaying broad or public access while allowing internal use risks widening the gap between internal models and publicly available ones. If companies keep using models for AI R&D while everyone else sees weaker ones, society may get less warning, less scrutiny, and less time to adapt. So if a model is too dangerous for broad access, it makes sense to also ask whether it’s too dangerous for internal use.

This question is hard to discuss clearly because the tradeoff seems.. a bit politically radioactive. While “we should accept some level of misuse risk in order to preserve resilience and reduce centralisation” may be a very valid argument, it would very likely sound insane to many. I wouldn’t expect national-security people to be very sympathetic to abstract decentralisation arguments given the downside may involve disaster.

There is also a Bostromian vulnerable-world-style worry here. [Maybe some future technologies are so destructive, and so easy to misuse, that targeted chokepoint governance is not enough](https://nickbostrom.com/papers/vulnerable.pdf). If the dangerous capability still depends on visible inputs (e.g. ordering DNA, accessing specialised equipment, acquiring unusual materials) then governing chokepoints might work. But if the path to catastrophe eventually becomes cheap, local, fast, and hard to detect, then much more intrusive surveillance and enforcement starts to look like the only way to prevent it.

Also the speed of the harmful action matters here. Detection only helps if enforcement can intervene in time. If a dangerous process takes weeks and has visible inputs, society has room to act. If it can be executed quickly by a small actor with pretty ordinary resources, the solutions become a whole lot uglier.

So there is a chance that some misuse risks cannot be handled by “defender-first access plus a path to general release.” My guess is that this is very unlikely for the near-term misuse scenarios I’m worried about, where physical chokepoints still seem important. In the more distant future, it’s plausible though.[[6]](https://forum.effectivealtruism.org/feed.xml#fnjpclc45qjh)

My friend Lee Wall encapsulated this well in a scrappy Google Doc comment:

“If e.g. mirror bacteria are actually really hard to create and you can easily watch all the inputs, then we could just watch chokepoints

If in 4069 A.D. it's incredibly easy to build your own chip fab and train a malevolent ASI without being detected, this points toward some kind of mass surveillance being necessary

And enforcement needs to work fast enough after detection to bite -- if I could trigger a vacuum decay by snapping my fingers, it wouldn't matter how much mass surveillance or state capacity there is.”

I think the strongest objection to my “minimum-viable” instinct is open-weight release. Closed/API models can at least in principle be monitored, rate-limited, refused, audited, revoked, and so on. However once weights are released, many of these levers disappear: the model can be run locally, safety fine-tuning may be removed, and further fine-tuning can make the AI more useful for the capabilities we’d want to restrict.

So I do think open-weight release is a special case. If a model materially uplifts catastrophic misuse, and if post-release safeguards are not robust to modification, then waiting to intervene until misuse is observed may be too late. In that case, stricter ex ante release thresholds may be justified despite the risks.

But I think this still doesn’t collapse into “frontier models should be permanently enclosed.” The better principle is that restrictions should track irreversibility and demonstrated risk. API access, commercial deployment, and weight release should face different thresholds.

For example, open-weight release may need stronger pre-release checklists, while still being paired with physical chokepoint governance, open-weight-specific safeguards research, and defender-first access programmes. I’m pretty uncertain about the specific tradeoffs involved though and need to form a more coherent take on this.

The US government [using export-control authority to restrict access to Anthropic’s Claude Fable and Claude Mythos after concerns about a jailbreak](https://www.anthropic.com/news/fable-mythos-access) seems like a warning shot here.

While the details about this are still coming out, it was definitely not a clean and transparent decision to institute a pre-deployment model licensing process. It was an abrupt intervention that forced Anthropic to restrict access for everyone [because it couldn’t reliably verify which users weren’t US persons](https://www.cybersecurityintelligence.com/blog/anthropic-has-disabled-ai-tools-because-of-us-security-concerns-9475.html).

This has made many people worried that we might end up in a world where no one except the government and frontier AI companies has access to frontier models. Though it remains to be seen how true this is (my guess is that the situation probably won’t be this capricious going forward).

Either way, in that world, the relevant question is less “should access controls exist?” and more “can we make the emerging regime legible, technically grounded, procedurally fair, and less centralising?”

If governments are willing to intervene this abruptly over cyber-jailbreak concerns, they are very likely to intervene over stronger bio-misuse concerns (once they become legible). So even if the ideal approach is something like “target chokepoints first, use defender-first access, avoid permanent enclosure,” a lot may just end up depending on whether national-security actors can be pushed toward clearer rules rather than opaque, ad hoc deployment vetoes.

A concern here is whether broad model access would actually do the decentralising work you’d want it to.

This may only be true if people also have [enough compute to run them at useful scale](https://epoch.ai/gradient-updates/is-a-compute-crunch-coming). If only a few actors can afford large amounts of inference, then access to model weights may not matter very much.

This seems especially important in the most dire scenarios. A world where many actors can run capable open models is very different from a world where a few AI companies or governments can run vastly more inference on substantially better closed models. In the latter world, the main source of power concentration may be control over compute, deployment infrastructure, and internal access to the strongest systems, not merely whether some weaker model is publicly available.

I wonder what an anti-power-centralisation-pilled and resilience-pilled int’l AI governance research agenda might look like.

Maybe one project here would involve concretely thinking about how different types of international governance work fit into this framework. For example: [centralised moratoria proposals](https://arxiv.org/abs/2310.09217) may concentrate power and reduce global resilience, while [increasing the leverage of middle powers](https://x.com/AlexTPet/status/2065513459463192722) and [benefit-sharing for defensive purposes](https://cdn.governance.ai/Options_and_Motivations_for_International_AI_Benefit_Sharing.pdf) would not.

Thank you to Alec Harris, Josh Landes, Lee Wall, and Rudolf Laine for useful discussions.

I should note that there are many other (non-potentially-catastrophic) forms of misuse, such as someone [using ChatGPT to help with a bomb attack](https://www.nbcnews.com/news/us-news/driver-las-vegas-cybertruck-explosion-used-chatgpt-plan-blast-authorit-rcna186704). While these deserve attention, this post only focuses on the most extreme forms of misuse.

I’m currently pretty skeptical that cyber poses any truly catastrophic risks by itself. Maybe I’m wrong and there are weird tail scenarios where AI-enabled cyber helps a rogue actor steal nuclear-relevant information, compromise critical infrastructure, or enable some other extreme incident. But there’s an argument to be made that in most cases, cyberattacks would probably just motivate defensive investment.

Though, cyber uplift is far easier to demonstrate and make salient to policymakers compared to bio uplift (given you can “do the thing” in-silico). So maybe there’s also an argument to be made that even if it doesn’t pose a catastrophic risk in and of itself, it’s useful for laying policy groundwork that could help reduce extreme bio-risk later down the line.

Public biorisk benchmarks have saturated quickly, though their real-world implications [remain ](https://epoch.ai/gradient-updates/do-the-biorisk-evaluations-of-ai-labs-actually-measure-the-risk-of-developing-bioweapons)[uncertain.](http://uncertain.it) Generally it seems hard for outsiders to __form takes on bio-related stuff.__

A pre-release checkpoint of GPT-5.5 scored[ 52.0% on SecureBio’s Virology Capabilities Test](https://securebio.org/blog/gpt-5-5-pre-release-assessment/#211-virology-capabilities-test-vct), placing it in the 100th percentile relative to tested human subject-matter experts (!!!).

These compare[ internet-only or unaided groups with AI-assisted groups](https://metr.org/blog/2026-02-19-five-lessons-from-ai-biology-rct/). The few results we have seem mixed so far: METR’s recent wet-lab RCT found signs of usefulness at specific tasks, but [“did not produce a significant effect on end-to-end success across the three core tasks together”](https://metr.org/blog/2026-02-19-five-lessons-from-ai-biology-rct/)

Other forms of potential catastrophic misuse I can think of (off the top of my head):

Catastrophic persuasion: I’m skeptical this could ever be catastrophic, barring some godlike ASI one-shot mind control thingy. Though, while not out of the question, that seems unlikely and incredibly hard to turn into a coherent threat model.

Chemical weapons: While I don’t know much about this, the pathways to harm and chokepoints seem similar to bio?

Much later into the singularity: galactic weapons, space warfare stuff, unknown unknowns, other new technologies. There’s [a great post by Beren Millidge on whether space warfare is offence or defence dominant](https://www.beren.io/2025-11-22-Space-Warfare-Seems-Mostly-Defense-Dominant/).