# Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

> Source: <https://www.lesswrong.com/posts/uK7gguNeME5DnmdrL/preparing-for-warning-shots-to-catalyze-international>
> Published: 2026-06-05 16:08:38+00:00

This is a write-up on preparing for warning shots to catalyze international cooperation on AGI risks, and the corollary list of projects one could pursue. We argue we must first (1) understand types of warning shots, then (2) prepare to catch them. We must stay vigilant: both to (3) avoid getting 'frog boiled' by AI labs, and to (4) ensure that the warning shot is generalized to the overall danger of AGI. Lastly, we must (5) prepare good policy responses and ground for it to land, and (6) seize the first-mover advantage when the opportune moment comes.

This yields the following list of promising projects one could pursue:

**Two Important Notes: **This is * not* a call for people to induce a warning shot.

The AI Safety community currently places a lot of hope on warning shots inducing international cooperation on AGI risks. It would be useful to better understand the dynamics that lead from warning shot to international cooperation. How likely are we to get a warning shot prior to unacceptable risk? How significant would the warning shot have to be, and what other conditions must be met to open a policy window for international cooperation? How do we strike the right balance between attempting to galvanise action after every minor “warning shot” (at the risk of being dismissed for “crying wolf”) and waiting for a major event (at the risk of acting too late)?[[3]](https://www.lesswrong.com/feed.xml#fnecruhjzc8ch)

Warning shots could include alarming safety evals, the release of a strong AI agent (another “ChatGPT moment”), widespread automation of white collar jobs, minor/major accidents, misuse incidents, etc. To make the most of warning shots, it would be useful to characterise different types of warning shots, predict how likely they are to occur, and anticipate what the expected public / policymaker response is likely to be for each type.

A useful frame is Kingdon’s [three streams model](https://www.wikiwand.com/en/Multiple_streams_framework). A warning shot mostly affects the “problem stream”: it makes some latent risk suddenly feel real. But international cooperation on AGI risks will only become plausible if the “policy stream” already contains credible proposals, and the “politics stream” contains enough elite, public, and institutional support. The practical implication is that warning-shot preparation cannot just mean “better messaging after the event.” It requires pre-building policy options, coalitions, legitimacy, and channels to decision-makers.

Preparing to catch warning shots requires a *detection stack*: capability evaluations ([especially](https://www.youtube.com/watch?v=Z19UEZHJzAg&t=1936s) labs entering an intelligence explosion), alignment evaluations, incident reporting, compute and deployment monitoring, whistleblower channels, and more. For certain types of warning shots, we will *only* get a timely warning if we build such infrastructure beforehand.

The AISI network could become the institutional backbone for warning-shot detection. UK AISI was founded on the mission of "minimising surprise" from rapid and unexpected advances in AI, which is almost exactly the institutional version of “catch warning shots early”.

The release of ChatGPT served as a wakeup call because it caught people off-guard. With AI labs releasing new, incrementally more powerful models every week, we risk reaching dangerous capabilities without this resulting in a single, clear warning shot. Similarly, different organisations publishing a steady stream of increasingly disconcerting safety evals may be less impactful than e.g. the network of AISIs publishing a prominent report every half year which summarises the results of all safety evals.

Rachit Dubey [have run](https://www.science.org/doi/10.1126/science.aeb2654) large-scale experiments showing that humans "continuously reset their perception of 'normal' every few years" — incremental changes don't trigger alarm even when cumulative changes are dramatic. Their key intervention is presenting data in *binary* rather than continuous form (lake-froze-or-not, rather than temperature curves), which produced significantly higher perceived urgency.

When a warning shot occurs, there might be societal and commercial pressures to portray this as a “bounded issue” specific to a certain AI model, company or situation. We should communicate effectively to ensure it is understood as a broader danger of AGI development.

Communications research (Entman, Iyengar) consistently finds that whether an event is interpreted as episodic ("one bad actor / one bad model") or thematic ("a systemic property of AI development") is largely set in the first 48–72 hours by the dominant frame in elite media. This could be a tractable advocacy target: prepare frame-setting materials and relationships in advance, so that they could be presented within the news cycle of a triggering event. A warning shot lost to the episodic frame could be hard to recover.

We need to have shovel-ready policy blueprints available when a warning shot does happen. Best options are at the pareto frontier of: mitigating AI x-risk, highly memetic for policy communication, and consensus-building in AI Safety community.

**Yet, for those to succeed most of storytelling must come before the warning shot.** If no communication is done prior to the warning shot, then *people have no world models* of how this warning shot connects to dangers of AGI. So it just passes them by without understood implications. Holly Elmore has a [good post](https://forum.effectivealtruism.org/posts/bDeDt5Pq4BP9H4Kq4/the-myth-of-ai-warning-shots-as-cavalry) emphasizing this.

Besides storytelling, there needs to be a broader set of infrastructure, laying the ground for good policy responses to land. [Ben Norman](https://theoptionspace.substack.com/p/when-have-warning-shots-led-to-international) particularly looked at what it takes for warning shots to translate to international cooperation. Reviewing cases from Three Mile Island to COVID-19, he identifies five conditions that tend to be in place when an event actually leads to international agreements: pre-existing institutional capacity, clear attribution, transnational harm, aligned political incentives, and ready-made solutions. AI scores poorly on most of these, which suggests the community should treat smaller warning shots as opportunities to incrementally build the scaffolding any future agreement would need to land on.

We should anticipate likely bad reactions, communicate effectively on why these are in fact bad ideas, and capitalize on first-mover advantage when a warning shot happens to push good policy proposals instead.

First-mover advantage matters because the first plausible interpretation of a crisis often becomes sticky. A warning-shot playbook should specify what happens in the first 72 hours, the first week, and the first month: who drafts the public explanation, who briefs policymakers, which validators are activated, which policy ask is pushed, which bad reactions are pre-butted, and which international counterparts are contacted.

Suppose we had clear evidence COVID was the result of a lab leak. That same warning shot could plausibly produce very different outcomes depending on which interpretation sets first. International agreement to halt gain-of-function research, and thus much stricter safety requirements for labs that pursue it. Or, just as easily, countries accelerating their own programs to capture the demonstrated power of the technology, while becoming more secretive to avoid PR disasters. Which of these locks in depends largely on whether someone is ready in the first 72 hours with a credible interpretation, a concrete ask, and the relationships to get both in front of the right people.

Justin Shovelain accumulated the list, Thomas van Damme made an early draft, while Mark Kagach and Elias Schlie wrote the final version.

Thanks to Ben Norman, Richard Mallah, Holly Elmore, and others for valuable input.

Warning shots are frequently tragedies, we do not want them to happen. Our job is both to prepare to respond well, and to prevent them.

Best governance strategies are viable without a warning shot ever happening. Excessive dependence on one is a common failure mode in AGI governance strategies.
