The Invisible Side of AI Governance

wpnews.pro

Tldr: Most strategic writing on AI governance on LessWrong describes the outsider game, which is most often visible: press, statements, open letters. Here I want to describe the other, invisible half: the insider work within ministerial cabinets and international fora, and the work of people within national and international institutions. Here are a few claims that I defend in the post:

I think the AI Safety Community is under-indexing on the invisible part as a result, which might mean we miss large avenues for impact. Some of the strongest questions/objections of this type of invisible policy work are at the end.

Who I am: I've played both the visible and invisible games, and as someone with a technical background, I was initially very unfamiliar with the invisible game. I initially found the invisible game alien and slightly distasteful, so if anything, my priors ran against the thesis of this post. I switched from technical safety (EffiSciences, ML4Good) to governance two years ago and co-founded CeSIA, the French Center for AI Safety. I initiated the Global Call for AI Red Lines and now spend a large fraction of my time on continental European policy, with the OECD Hiroshima AI Process Reporting Framework (HAIP), UN forums, the EU AI Office Code of Practice, and ministerial cabinets in Paris. I also founded ML4Good, which placed alumni in insider organizations such as the EU AI Office and the UK AISI.

Thanks to Jonathan Salter, Arthur Grimonpont and Epiphanie Gedéon for their useful comments and discussions that led to this piece. Thanks to ControlAI, which also inspired me to write a bunch too.

Do you know who exactly:

CeSIA participated in some of those, and I have guesses for many of them and access to private information, but for many, I don’t really know. But you can be sure that all of this didn’t happen by luck, but happened by sheer force of will.

Obviously, we are all more aware of invisible work that succeeded than of that that failed; nonetheless, to my knowledge, all the elements in this list come from insider work.

(PS: A fair pushback from a colleague: I attribute these outcomes to insider work, while perhaps it was the outsider pressure that created the conditions that let insiders push at all. I partially agree: the anglophone press taking AI risk seriously made many of these doors easier to open. But someone still had to walk through them.)

As a point of reference, 70% of what CeSIA does is invisible: private memos and letters targeted at only one policymaker. If you don't talk to us, you'll have no idea of what we do on a day-to-day basis. Our website is a showcase of impact and projects to maximize credentials, and this does not necessarily reflect what we do on a day-to-day basis at all.

My prediction is that maybe 90% of the policy work done in administrations is largely invisible.

People in various administrations who can make those decisions are not that visible. I bet you didn’t know about Secretary Lutnick and National Cyber Director Cairncross before Fable.

The outsider game happens outside institutions: the press, op-eds, public statements and content creators on platforms whose audiences eventually shape the discourse. This enables outsiders to have relative freedom to say what they actually believe (loss of control, extinction, what's coming) without filtering it. The bet is that visibility eventually reaches policymakers through the media they read. The main KPIs are generally press mentions and number of people reached.

The insider game within institutions because they think that's where the power sits. Insiders generally speak directly with the person currently in charge of AI in a given administration and aim to be useful to them. This work is generally invisible, because you want to create a trusted channel. Insiders are generally cautious with their language and are okay with grinding for incremental improvements and adding a few layers to the Swiss cheese.

Obviously, those are caricatures, and it’s a spectrum. There are insiders-insiders, for example, the people working at an AISI; there are semi-insiders, for example, organizations working only with institutions, and very rarely with the general public; and there are clear outsiders, who are mostly doing external advocacy for a particular policy.

(A note on the axes I'm using: this post slides between several distinctions that correlate but are not the same: insider/outsider, invisible/visible, executive/legislative, operational/intellectual. A ControlAI briefing to an MP is insider-ish but legislative and fairly visible; the AI Office's work is institutional but partly public. In practice these axes cluster, which is why I move between them, but keep in mind they can come apart.)

When I began in AI governance, mainly playing the outsider game, and my main modus operandi was the bazooka, it took me some time to discover other tactics. Nowadays, a growing part of our activities is akin to that of an advisor / useful assistant.

Here are 3 stances, from most outsider to most insider:

The bazooka: The first stance I'll call the bazooka. You arrive with prepared briefings and talking points. You have urgent things to say, and by damn, you will say them. You don't spend much time on what the person across from you actually needs. You have a message, and you want it said. The bazooka is what you do when you assume the meeting is your one shot. This can work, but this is riskier for the quality of the relationship if they don’t get it on this first meeting.

I think the bazooka is ok when you have many shots, like in the legislative branch, where you have hundreds of MPs; you don’t have that many shots in the executive branch, where potentially only 3-4 people matter and are all connected to each other.

The second stance, I'll call the useful assistant. You arrive, you listen, you figure out what they're working on this month. You make yourself useful, potentially on a topic that isn't your terminal concern. You send a short note on a question they're puzzling over. You introduce them to someone they should know. You become a helpful presence in their inbox, rather than another lobbyist with an agenda eating up their time.

Of course, the memos you deliver can (and should!) be colored by your understanding of the risks. But the primary content is often "FYI, you can say this / we can push for this measure without political risk because the same framing is already named in this other official document."

But obviously, you can also propose cool things that are within the Overton window and have not been considered so far (for example, while working on the questions from the Hiroshima AI Process, [1] we made sure that we would add questions on whistleblower protection, and add a question on thresholds at which severe risks posed by a model or system, unless adequately mitigated, would pose unreasonable risks - among other things).

Then, weeks or months later, when they have a question that's actually adjacent to your real concern, they call you. And you get promoted into a trusted advisor. The conversation is now between people who trust each other. Information bandwidth is much higher. And now your real message can land.

Many technically-trained newcomers operate in bazooka mode. “Short timelines!! We don’t have time, and we need to convince them as quickly as possible!” The problem is that in a small ecosystem with few important actors, the bazooka can burn the relationship. I’ve found empirically that the useful assistant compounds it. Reputation is everything in those ecosystems.

Of course, at some point, you need to say what you really believe, and explain the risks - otherwise you risk becoming a sycophant.

Should we prioritize the executive or legislative branch? I believe that this is highly dependent on the country.

ControlAI's Direct Institutional Plan (DIP) and the late Center for AI Policy's sequence prioritize the legislative branch, which makes sense in the Anglosphere. Leticia García Martínez's posts on briefing 70+ and then 140+ UK lawmakers document a serious campaign.

In Westminster systems like the UK, the Prime Ministers come from the legislature, party platforms shape policy, and backbench pressure can force executive action or pass legislation outright. That's the implicit model behind the Direct Institutional Plan from ControlAI: brief enough MPs, build public pressure, force the executive's hand, maybe even elevate one of the parliamentarians you've briefed into government.

In France (I talk about France because I know the system quite well), I’d say the legislative branch is less important because member-state AI policy is downstream of the EU AI Act, so the legislative action already happened one level up, in a Parliament that was decisive (no AI Act, no GPAI provisions without it). The French constitution makes the President as close to a king as a democracy allows: directly elected, appoints a Prime Minister (who usually isn't even a member of parliament). For AI specifically, the centers of gravity are all in the executive branch:

The French Legislative branch is much less influential than any of them; all of the above came from the executive.

I’m not completely sure about this, but I lean towards no, because I find it is easier and a better use of time to discuss with the 2-3 people in the executive branch who matter than to convert the whole parliament. A well-briefed cabinet advisor is worth a lot more than a well-briefed random MP. If CeSIA had three times the staff, I'd happily allocate a third of it to a DIP-like operation; engaged French MPs are useful for open letters, committee questions, and political cover. But as a primary strategy in France, I believe the DIP loses to targeted executive briefing.

(I must say that I have no experience in the US, and it seems like reaching out to one sensible legislator like Scott Wiener who was particularly effective and transformative for passing SB 53 - but in EU member states, countries’ policies are downstream of the EU AI Act. When you try to brief a French MP on AI safety, the response is roughly "OK, but what's wrong with the AI Act?", whereas there's still no equivalent federal law in the UK or US)

The same logic applies internationally. There is no global parliament. UN Security Council, G7 ministerial, OECD, EU AI Office: every forum that matters for binding AI policy is executive in nature. Even when laws do get passed, enforcement is executive. The AI Act will rise or fall on the AI Office's execution, the political will of DG CNECT (the European Commission's digital directorate), and the High-Level Commissioner's appetite to actually issue fines during geopolitical crises.

And that's for things inside the scope of the law. Most topics fall outside it. Internal and military deployments aren't covered by the AI Act at all. Anthropic's Mythos and Project Glasswing were both technically out of scope (since those are internal deployments, not yet deployed in the EU market). Those will only ever be addressed by executive choices.

The strongest objection to the executive-branch focus is that executive access is relatively fragile: it depends on who's in the cabinet this year, and most likely evaporates on election day. In some ways, this creates a dangerous single point of failure because it concentrates your influence in a way that legislative/public work doesn't. Legislatures and public pressure are slower, but more durable and more resistant to a single hostile administration.

I agree with ControlAI that, to a large extent, most policymakers have never heard of catastrophic risks, and that ensuring they hear about them should be a priority.

Some research and intellectual production has been important and highly publicized, the strongest example being AI-2027, but this is a strong exception.

A huge part of the work that has impact in AI governance is very different from doing research:

Concrete example of impactful intervention that is not really researchy.

Yet most podcasts in the community are overwhelmingly skewed toward presenting researchers' and intellectuals' work.

Public work is not necessarily visible to whom it matters. One related allocation mistake worth naming. Most of the senior policymakers I work with in continental European AI policy don't treat Twitter as a serious input. Twitter reaches the AI safety community itself, some journalists who already cover the beat, and, personally, policy staffers in the LessWrong/EA orbit. That's a real audience, and it is important to share your work with the community, but let’s be clear: this won’t reach the cabinet advisor. And to a very large extent, the same goes for LinkedIn. Emails, direct messages, or oral briefing generally remain the gold standard for busy policymakers.

I believe thinking carefully about your communication is really important, and it took us a lot of time to grok this.

For example, we did a study of our organization’s LinkedIn, and we found that that a vast majority of our likes came from people already in our community - and that the only way to reach policymakers so far was direct messages (this doesn't make the public channel worthless: this is essential for your organization to build your organization's profile, and for example to get strong hires when you need to recruit). The excellent newsletter Sentinel claims to be bottlenecked by distribution. Good communication work here to improve the distribution could potentially 10x their impact. Such a hypothetical guy who could 10x this impact with good communication would never get invited on the 80,000 Hours podcast. I think this is nonetheless impactful work.

At the end of the day, I think that you should produce enough prestige work to earn standing, then stop over-producing it.

(When writing this section, I had ControlAI / AI / MIRI as representative of the Outsider worldview)

Raising situational awareness is not a bottleneck. One of the main goals of outsiders is to push the Overton window, and to raise situational awareness. But it seems to me that people are gradually waking up, and that the Mythos and the Department of War incidents were much more successful at doing so than any open letter so far. Potentially, the comparative advantage of civil society should be to tell policymakers what should be done, rather than to explain the risks, which will become clear after an unmistakable warning shot.

I personally don’t think that this is totally true, because after Mythos, people have updated on cyber, but not yet on biorisks, and potentially loss of control, so raising situational awareness is still very much needed, but I think Mythos should update us a bit towards this type of activity being a bit less important.

**Targeting the general public and random legislators is not the most direct way to achieve action. **The general public has little power. Random legislators might sign a statement, but might not be the best people to act. People with power generally cannot sign statements.

By aiming for the moonshot, you miss the cheap asks that could actually pass. I don’t know, I feel like there are cheap asks (say, for example, transparency, or mandatory incident reporting) that are neglected by outsiders shooting for the moonshot Moratorium - and I feel that this is not strategic, because the cheap ask has a serious shot of being implemented, and this could be a meaningful improvement over literally nothing.

The insider climbs to the local optimum, the outsider shoots for the global one.

Insiders operate within the Overton window, and that window might not be enough. The first is that the media and public-opinion environment may compound more than I'm giving it credit for. Gabriel Alfour at ControlAI makes a version of this argument I take seriously. His critique is that the KPI I optimize for is a leading indicator of having been optimized by the world, not of having optimized it, because the language that gets cited is, by selection, language that was already inside the Overton window. If outsider work really is the bottleneck for window-shifting, it is closer to 50/50.

The insider work might be neglecting the second-order effects; one briefing takes time, and this is hard to scale. While engaging with the public discourse shapes the information environment cabinet advisors swim in, even if they never click a tweet.

None of this matters if the US won't engage. Post-Mythos, the US response was to block Claude Fable's deployment in Europe. The current administration has been actively hostile to AI governance at the UN level. If the US is a hard no through 2028, much of CeSIA's strategy is largely deferred until conditions change. In several international forums over the past 18 months, US positioning has been the ultimate bottleneck. My main answer to this is that it’s a priority to do insider work at the White House for those who can access it.

Most AI governance efforts are currently pushing for a defense-in-depth paradigm. The Code of Practice of the AI Act is a canonical example of such Swiss-Cheese regulation;

Here's one hypothesis: I’ve found that one source of epistemic difference between insiders and outsiders is a technical difference in the following belief: Outsiders generally believe that the cheese is cheesy, that the game needs to be flipped, and that we need a paradigmatic change in how things are done. Insiders generally think that more capacity, more preparedness and more layers to the Swiss Cheese will do a great deal of the job, and insiders are very happy to see that one of their recommendations has been incorporated verbatim into the official document to add a new layer to the cheese, for example, transparency measures or whistleblower protection.

Insiders will generally say: each layer (evaluations, red lines, reporting, audit, liability, and so on) is individually imperfect, but stacked, the failure probabilities multiply. I made the same argument for technical scheming mitigations in my 80/20 playbook: combine architectural choices, control, white-box detection, black-box monitoring, and elicitation, and you substantially cut scheming risk.

The critique I take most seriously is the following one: The whole structure assumes independence. It collapses if failures are correlated. A sufficiently capable misaligned system defeats every layer through the same underlying capability (long-horizon planning, situational awareness) rather than failing at each layer independently.

To some extent, I think governance layers are more orthogonal than technical mitigation layers. An audit regime, a deployment threshold, and a liability rule fail on different axes (political, technical, legal), so their failures should be only weakly correlated. I'm less sure about the technical layers. People I respect on the technical side think the correlation between, say, CoT monitoring and probes is high enough that the Swiss-cheese math doesn't really work. If they're right on the technical side, that's a bigger problem than the governance correlation question, since governance partly depends on technical mitigations actually catching things.

My best defense is that implementing a new layer such as AI control on top of prosaic alignment measures should enable you to catch the AI red-handed, and subsequently increase the level of political will.

Let me repeat that I think that both insider and outsider work should exist, and I’m not even saying that we should do less outsider work - the purpose of this post was mainly to explain another facet of AI governance that is not often presented on LessWrong. Here are some practical implications:

Consider working inside your administration on AI. This might be a neglected career move in the community. ML4Good, Horizon and Talos have all made a few placements in very relevant institutions. The people behind the items in my opening list were mostly in the room.

Decide consciously whether your ask is inside or outside the Overton window. Many failures I've seen come from playing one game with the other game's tactics: bazooka mode in the executive branch, or timid asks in a public campaign.

Self-select on temperament. If you like working like a consultant by listening, making yourself useful on someone else's agenda, the invisible game might be for you.

If you're a funder, notice the legibility bias. Invisible work might be structurally underfunded and less visible.

The only international reporting framework so far was initiated at the G7 and under the auspices of the OECD secretariat. This is, concretely, a list of questions that will be sent to companies, and they will answer publicly.

source & further reading

lesswrong.com — original article The LLM shoggoth meme is weirder than you think Why should AI be moral? How I think developers of frontier AI systems and regulators ought to act in the face of existential AI risk

The Invisible Side of AI Governance

Run your AI side-project on zahid.host