# Don't just aim for Frontier Labs

> Source: <https://www.lesswrong.com/posts/eAMyxM28hNp4ewGdT/don-t-just-aim-for-frontier-labs>
> Published: 2026-06-14 04:41:05+00:00

*Why AI safety should live wherever AI is deployed, not just where it is built.*

I spotted a request for feedback on whether someone with AI safety experience should take a for-profit company and "get their hands dirty" as an AI transformation leader, pivoting away from a strategy focused on AI research labs. I work at a SaaS company, and find it meaningful to de-risk AI in products that impact millions of people. If experienced safety advocates avoid opportunities where AI is deployed and focus only on existential risks, wouldn't that worsen near-term outcomes?

I actually held the default view after my first 5-10 readings on AI risk in the 2010's (including Nick Bostrom, Tim Urban, 80,000 Hours, Future of Life Institute). Specifically, by 2021, I had developed the intuition that making an impact on AI safety did require working at an AI lab. After all, the lab was where most AI-accelerating change appeared to originate, and therefore had to be the best place to steer towards positive outcomes.

The last 3 years changed my mind, if only through [actual, published incidents](https://airisk.mit.edu/ai-incident-tracker#explore-dashboard). Even with recursive self-improvement at our doorstep, I believe that absolute control over our future (especially the future of people alive today) is difficult enough to warrant a time discount on AI-specific X-Risk. This underscores the merits of assigning moral weight to real-world harms in the present and, therefore, allocating some resources to mitigate them. Empirically, they both get interest and funding, [according to Erich Grunewald](https://forum.effectivealtruism.org/posts/hXzB72kfdAk6PTzio/attention-on-existential-risk-from-ai-likely-hasn-t). Now, there are some information gaps in both, but while [Catastrophic Risk is material (MIT 2026)](https://cdn.prod.website-files.com/669550d38372f33552d2516e/6a172558bd2947234379749f_a8684052fd49a64374c9a9d3e4e5ab59_Prioritizing%20the%20risks%20from%20Artificial%20Intelligence.pdf), we also understand that Frontier AI labs benefit enormously from spotlighting X-risk and downplaying the present risks and non-catastrophic harms of LLMs ([Karen Hao](https://www.abc.net.au/listen/programs/theminefield/ai-and-the-cost-to-human-life-with-karen-hao/105872216)).

What grounded, rational arguments can we bring to folks debating whether to join a "regular for-profit company"? Should AI safety talent be focused on the labs that develop AI? The precedent in the cybersecurity domain is widely integrated talent, not a concentrated island in the labs that build underlying tools, as I've seen for 15 years in the field. Furthermore, eight other mature safety-critical industries, including aviation, finance, and medical devices, have already faced this question. I wrote this essay to find the answer and share it as clearly as possible.

It's intuitive that the dangerous thing is a frontier model, that it is built at a handful of labs; and the intuition that follows is that the people who understand the risks, technical details, and work towards mitigations should mostly sit at those labs. Doesn't everyone downstream just call an API? This intuition could be called a *factory-gate fallacy*: the idea that safety is finished where the thing is made.

But how could safety be a substance we can manufacture and ship? Safety is a relationship between a system and the environment in which we choose to deploy it. Even the language to interpret safety risks, factors, or outcomes depends on downstream operations. How could the competency to manage it avoid that space if we intend to reach positive outcomes?

The essay builds the case in stages: defining what is actually being argued, walking through the industries that settled this question decades ago, weighing what the AI-risk community has already said for and against, checking what is happening on the ground right now, stating the core claim as precisely as possible, and then taking the strongest objections apart one at a time. I am glad to be a part of that understaffed world, just as I've been glad to be part of the broader cybersecurity world for many years before it, even as they both influence my perspective and biased priors.

*The claim*: AI safety and security competency requires directly responsible individuals with a focus on it within every organization that adopts AI or interacts with other actors who have adopted AI, not only in labs, and this pattern is a high-impact opportunity. The corollary is that concentrating that competency exclusively in AI research-and-development organizations is neither the current trajectory nor compatible with a safe transition to widespread AI use.

*AI safety competency* could be said to focus on “[Layer 8](https://en.wikipedia.org/wiki/Layer_8)” i.e. the human risks above the application’s base behaviors. Talent in this area focuses on evaluating tendencies and failure modes, developing systems for robustness, and, of course, monitoring how the system behaves in use. Ji and colleagues' (2025) [ACM Computing Surveys overview of alignment](https://dl.acm.org/doi/10.1145/3770749) separates how a system *learns* the right objective from *assuring* that it did. The assurance half is the evaluation-and-monitoring work that happens at and after deployment. In a deployer organization, some sub-skills include:

*AI security competenc*y is about defending the system against adversaries and misuse, where applications are tightly integrated, and models are now wired into a multitude of tools, data, and autonomous workflows. The [OWASP Agentic Security Initiative](https://genai.owasp.org/initiatives/agentic-security-initiative/), with more than a hundred contributors including yours truly, publishes numerous guides (e.g., the [Top 10 for Agentic Applications](https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/) 2026) that characterize concrete threats (e.g., agent goal hijack, tool misuse, and identity-and-privilege abuse). Those particular failures are not properties of the model weights: even though the model is causally involved, they occur only once a model is deployed in an application, with real data and permissions. Luckily, many OWASP contributors are not AI lab members but work in deployer organizations (granted, often with a strong security business presence). In parallel, NIST (a reference for security governance) published the [Generative AI Profile](https://doi.org/10.6028/NIST.AI.600-1) (NIST AI 600-1, July 2024) that names sub-skills including:

These lists (safety and security) show problems outside of AI alignment. They live at the boundary between a model and the world. Evaluations measure behavior in a context, monitoring watches a deployed system, and an agent goal-hijack is an attack on a thing that only exists once someone has deployed it. Not much of this can be built in a vacuum and then shipped: every organization has different contexts, and the distribution can't be reliably simulated at the main labs.

*AI research-and-development organizations*: these are frontier labs (OpenAI, Anthropic, Google DeepMind, xAI, and their peers) but also the dedicated AI-safety and AI-security research orgs (thank you METR, MIRI, Palisade, Redwood, MATS, and everyone, you know who you are).

The motion, then, is that safety and security competencies and responsibilities apply to the whole system, and those competencies and responsibilities are at least as important down the chain, across deployers, operators, and resellers. The talent cannot be hoarded at the top, or we should expect negative outcomes.

The strongest evidence for the motion is that every mature safety-critical industry has already faced this exact question and answered it the same way. The pattern is very consistent:

Industry | Codified | Governing instrument | Operator's named duty |
|---|---|---|---|
Cybersecurity | 2000s onward | NIST CSF, ISO/IEC 27001, | CISO function, third-party risk program |
Aviation | 2013 (ICAO Annex 19) |
| SMS at airlines, ANSPs, MROs |
Automotive | 2018 | Lifecycle safety across suppliers and integrators | |
Pharmaceuticals | 2012 (EU GVP) | Distributed pharmacovigilance | |
Medical devices | 2017+ |
| Post-market surveillance at hospitals |
Nuclear / process | 1992 | Operator PHAs, mechanical integrity | |
Finance (model risk) | 2011 | Three-lines, "effective challenge" at user | |
Food | 1997 (Codex HACCP) | Hazard analysis at every processor | |
Maritime / Rail / Fire | various |
| Named safety roles at operators |

A few of these carry more weight than the others.

**Cybersecurity is a shared responsibility.** This model is explicit between a cloud provider, which secures the infrastructure "of" the cloud, and their customers, who must secure what they put "in" it.

Speaking as someone who reviewed nearly a thousand resumes over the last decade, interviewed over a hundred security engineers, and watched many cases of deception in the hiring process (botnet operators trying to join my company, workers overseas claiming to be US-based, teleprompter cheating) I witnessed that the job market is hard for everyone (I really needed to fill a role!) Getting safety and security talent across nearly every organization is not easy, but the good news is that we need it in both risk functions *and* as ambassadors and advocates. Thousands of decisions are made outside of organizations' security teams. High-impact roles' awareness, judgment, and priority can matter as much, if not more, than how securely <major technology company> builds the latest third-party solution, especially when alternative "challenger" providers push a race to the bottom in safety priorities and apparent unit economics. But it's not just the product (finished, or how it's made). Security and safety acumen is needed upstream, but also on every operator's security team, and across the operator's leadership roles. The software company or AI research lab's safety posture is barely a third of the picture.

Managing AI risk requires organizations to empower leaders and individual contributors to acquire AI safety knowledge *and apply it* - hence, readers from the AI safety community anxious to work at the major labs can "relax" and take a [PI-shaped role](https://resources.scrumalliance.org/Article/pi-shaped-work-skills) in a thousand times broader landscape - and the impact you want might not default to the title you think.

We can look at Heinrich's safety pyramid as a typical pattern of how harms are distributed. Fatal accidents are few at the top of the pyramid, while hundreds of minor accidents and thousands of near-miss incidents are already repertoired and slowly gaining coverage. Between January and May 2026, the Centre for Long-Term Resilience and METR documented those incidents with rogue agents, including cases impacting real people and real infrastructure. The numbers align very much with this pyramid. Chatham House discussions and even open forums over the last 1-2 years have surfaced still more issues, and HiddenLayer's 2026 findings (under confidentiality agreements) indicate a deeper set of issues that the affected organizations were unwilling to publish. The base is wide, and the middle layers are already showing real-world harm. Responsibility for *incidents in the wild* was not in the research labs, but with the designers, engineers, and operators of the systems in question.

The Cloud Security Alliance [puts it](https://cloudsecurityalliance.org/blog/2024/01/25/what-is-the-shared-responsibility-model-in-the-cloud) this way: you *can* delegate the work of managing a risk, but you cannot delegate the accountability for it. For that reason, serious enterprises run a CISO function (albeit sometimes with a lesser title for SMBs) and a third-party-risk program against frameworks like the [NIST Cybersecurity Framework](https://www.nist.gov/cyberframework), if only to attend to IT security. Often, responsibility for risk, compliance, and business continuity include other teams and named individuals as well. [exposed token spending cap](https://www.ox.security/blog/two-clicks-to-1m-how-attackers-can-drain-enterprise-budgets-through-ai-platforms/)).

Cybersecurity also shows the metaphor has its limits - as "mature" as the discipline may seem, it is hardly perfect. Security teams are somewhat deliberately scoped: they defend a certain footprint (gone are the days of a clearly defined perimeter) against an evolving family of threat classes, staffed at roughly [one to one-and-a-half FTE per hundred employees](https://www.iansresearch.com/resources/all-blogs/post/security-blog/2025/06/17/how-do-you-compare--2025-comp-and-budget-data-for-small-and-midmarket-cisos) even in smaller firms, and a [single-digit percentage of IT headcount](https://www.indeed.com/hire/c/info/best-it-ratio-staff-to-employees) more generally. That hundred-to-one ratio of defended population to defenders has borne dissatisfying outcomes, both for the enterprise and national security, but organizations that survive adapt (including by hiring in-house or managed security partners to influence and assist their IT and engineering orgs). At less.online 2026, sessions on cybersecurity acknowledged this gap. What we didn't name was that a security team that is excellent at network protection and credential hygiene is not inherently equipped to reason about whether an AI model for medical triage quietly disadvantages a [class of patients who might suffer or sue](https://www.afslaw.com/perspectives/health-care-counsel-blog/health-insurers-sued-over-use-artificial-intelligence-deny). I can leverage existing research, but may err in trying to prove naive hypotheses in stretch domains without crucial controls. Even assessing the permissions of an agentic workflow requires partnering with stakeholders: Does the agent really *need* write access or read access to all clients? The one-size-fits-all security solution to grant the permission "just in time" only ensures the identity gains access through the agent's real-time request, but doesn't prevent a rogue action or exfiltration. Behavioral mitigations to deny that request when needed typically require that the team combine safety-and-security competencies for AI, which is genuinely new work (before AI agents, this was often deferred to sanctions-based enforcement of policies and insider risk). The adjacent skills and responsibilities expand security and need to be resourced as such.

**Financial model risk also** maps onto AI almost without translation. In banking, [70% of firms are already using agentic AI in some capacity, while fewer than 12% describe their governance strategy as well-defined and resourced](https://interface.ai/blog/agentic-ai-governance-credit-unions-community-banks/). In other words, the deployment is happening, and the safety subject-matter expertise needed to govern it is not yet in the room, creating a visible tension. Some companies (26% [surveyed by IBM](https://www.ibm.com/thought-leadership/institute-business-value/en-us/report/chief-ai-officer)), a named Chief AI Officer role carries accountability for closing that gap, up from about 11% two years prior. Filling a role with that title is not a precondition for the work. A CAIO without AI safety contributors is also likely to face conflicts of interest from the mandates assigned to them. A pattern of paperwork that spreads fast and shallow (which [IAPP has found](https://iapp.org/resources/article/ai-governance-profession-report/)) does little to help reduce risk. The field needs safety among senior leaders, as well as the capacity to support and de-risk AI transformation across the organization's various departments. In 2011 (recovery phase of the financial crisis), the US Federal Reserve and OCC issued [Supervisory Guidance on Model Risk Management](https://www.modelop.com/ai-governance/ai-regulations-standards/sr-11-7)— SR 11-7. It requires that every bank that *uses* a quantitative model — not the vendor that built it — manage the risk that the model is wrong or misused, through three lines of defense: the people who build and own the model, an independent validation function, and internal audit. Its load-bearing phrase is "effective challenge," the demand that objective, informed people who understand a model's limits actually push on it. The rigor of that challenge scales to how much the bank relies on the model: a small institution scales the controls down, but never to zero.

Note: SR 11-7 was a regulator's response *after a systemic failure*.

**Aviation** makes the operator's role clear, too. Boeing and Airbus build the aircraft; nobody believes safety is finished when the plane rolls out. ICAO's Annex 19 and the FAA/EASA SMS framework require a formal, staffed safety-management function at airlines, air navigation service providers, and maintenance organizations that touch the aircraft throughout its life. The peer-reviewed work, such as the 2022 [review in Safety](https://www.mdpi.com/2313-576X/8/2/20), is largely about how to measure safety management maturity at operators, because that is where the question is live.

**Automotive** works the same way. [ISO 26262](https://www.iso.org/standard/68383.html) (2018) governs the full lifecycle through "production, operation, service and decommissioning," and binds suppliers and integrators rather than the carmaker alone. As a safety-critical artifact moves from designer to integrator to operator to service network, the competency to manage its hazards has to be present at each handoff, because each handoff introduces a context that the previous party could not see.

**Pharmacovigilance** distributes the detection and reporting of adverse drug effects, by regulating hospitals, distributors, and the company holding the marketing authorization. The control is not concentrated on the molecule inventors. In case real patients were to be harmed, due to combinations or populations the original trials could not capture, the distributed system has a genuine feedback loop to manage the risk.

**Medical devices** post-market surveillance also depends on the deploying hospitals, structured by [ISO 14971](https://www.iso.org/standard/72704.html) risk management and mandatory incident reporting. The device maker is dependent on the hospital to manage the feedback loop and how the rubber meets the (almost literal) road.

**Nuclear and process safety** decouple the operator of a hazardous facility from the reactor or process designer. Charles Perrow's *Normal Accidents* (1984) supplies the theory: in complex systems, the decisive interventions happen at the sharp end, during operation, by the people on shift.

I found research helpful to find fields outside my area of expertise, with different physics, different centuries, and different regulators, and yet a similar approach. When a technology can hurt people at scale, every society that has lived with it long enough has concluded that the operator needs safety competency staffed.

People who think carefully about AI risk have been arguing about this question. I want to give credit and inform my take with a few citations.

**Several recent arguments on LessWrong and adjacent EA spaces make a case for keeping talent in the labs**. One of my earliest reads was 80,000 Hours' "[Should you work at a frontier AI company](https://80000hours.org/career-reviews/working-at-an-ai-lab/#force-for-good-or-bad)", strongly favoring this view. Its rebuttals to the downsides *have not materialized,* for what it's worth. There are competencies whose leverage is bound tightly to frontier access. *Mechanistic interpretability* needs the model internals, the training checkpoints, and the compute to run experiments against them, and most of that lives at the labs. Bilal Chughtai's [weighing of frontier-lab safety work](https://www.lesswrong.com/posts/cyYgdYJagkG4HGZBk/reasons-for-and-against-working-on-technical-ai-safety-at-a) called it out well: unless you work at a 3rd party safety research firm, selling your time to the frontier AI company is unlikely to yield meaningful constraints in the face of revenue and power incentives, [similar to social networks](https://en.wikipedia.org/wiki/Category:Facebook_criticisms_and_controversies), and Chughtai explicitly calls out the risk that your work may be used for safety-washing. At the same time, safety for training and inference infrastructure (the actual serving stack, the fine-tuning pipelines, and the deployment harness) is critical and hard to do unless there are highly competent cybersecurity practitioners *with AI expertise* inside the organization that runs this infrastructure. Furthermore, a genuinely global threat-management vantage, the ability to see attack patterns across an entire API surface, is mostly the lab's. That said, *labs do not see* blind inference in AWS Bedrock, and [this demo](https://github.com/its-emile/adaptive-threat-bridge) and [my patent](https://patents.google.com/patent/US12375486B2/en?inventor=delcourt&oq=delcourt) show how some actors on the internet *can* also help build global monitoring beyond a single organization. It is also worth conceding that the labs can afford it: they pay more and compete hard for exactly this talent, so the gravitational pull toward them is real regardless of the argument's merits.

**The case for distribution also exists on LessWrong and the EA forum**. Boaz Barak's ["Six Thoughts on AI Safety"](https://www.lesswrong.com/posts/3jnziqCF3vA2NXAKp/six-thoughts-on-ai-safety), (January 2025) strongly supports the need for operators and deployers to formally manage the shared safety responsibility: there is no temporal gap. AI is being woven into high-stakes parts of society now, before any superintelligent helper arrives to clean up. Safety will behave like computer security, i.e. with no single magic solution, only defense-in-depth at every stage, including deployment and monitoring. The 80,000 Hours [problem profile on extreme power concentration](https://www.lesswrong.com/posts/qZrpjksTZBPA4cBr5/new-80k-problem-profile-extreme-power-concentration) (2025) also bolsters the case for distribution, because under time pressure, organizations hand more unchecked control to AI on the strength of its potential, faster than they build the competency to govern it. This unchecked delegation can occur in the labs, but we certainly see it across the industry. Kulveit and colleagues' ["Gradual Disempowerment"](https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from) (2025) shows that aligning each individual model with its developer's intent and world views is not sufficient: not only could this fail in many environments, but harms can also emerge from the interactions on the web between many adequately-aligned systems (as they reshape the economy, the culture, and the state, not inside the lab but inside ordinary institutions). This argument is also supported by economic incentives in Drago and Laine's ["Intelligence Curse"](https://www.lesswrong.com/posts/LCFgLY3EWb3Gqqxyi/the-intelligence-curse-an-essay-series) (2025), and Leveson ([Engineering a Safer World: Systems Thinking Applied to Safety](https://direct.mit.edu/books/oa-monograph/2908/Engineering-a-Safer-WorldSystems-Thinking-Applied)), which was specifically [applied to AI risk by Oliver Sourbut](https://www.lesswrong.com/posts/mL5asdegoa56CkqgJ/engineering-a-safer-world-risk-modelling-and-safety)

Abbey Chaver's ["AI Infrastructure Security Shortlist"](https://www.lesswrong.com/posts/xkE4zEzmArxgskZ96/the-ai-infrastructure-security-shortlist) (2026) also describes two different talent problems that are worth splitting.

Conflating the two is how people talk themselves into "there is too little talent to distribute" when the honest reading is that one narrow sub-problem is talent-bound and the broad one is investment-bound.

As a bottom line, rationalists have shown merit to both sides. The pro-concentration case rests on frontier access for specific competencies and the labs' ability to pay. The pro-distribution case rests on deployment-layer harms, defense-in-depth, the insufficiency of per-model alignment, and the fact that most of the security work is edge work.

In the motion, I claimed that concentration is not even the current trajectory. Is it true?

The best available data is the IAPP and Credo AI [AI Governance Profession Report 2025](https://iapp.org/resources/article/ai-governance-profession-report/), a survey of more than 670 professionals across 45 countries. Its headline numbers: roughly **77% of surveyed organizations are working on AI governance, rising toward 90% among those already using AI**, and about 30% of organizations not yet using AI are already building governance capacity. Distribution is already underway; it is a weak, uneven, early-stage reality rather than a proposal.

The same survey finds roughly half of AI-governance professionals sit in legal, privacy, ethics, or compliance functions, which suggests that business and technical functions may be lacking expertise in the safety-and-security layers. Outcomes may get worse if high-impact AI agents are rolled out under basic governance guidelines that spread quickly and shallowly, while technical safety and security competency is spreading slowly and remains concentrated upstream. We don’t just need voluntary training/adoption of governance scaffolding ([ISO/IEC 42001](https://www.iso.org/standard/42001), [NIST AI RMF](https://www.nist.gov/itl/ai-risk-management-framework), and [IAPP AIGP](https://iapp.org/certify/aigp/)), but broad and deep acumen that sees the risks clearly in each daily decision that steers AI agent deployments (e.g., technical AI safety staff or teams implementing OWASP ASI frameworks I already mentioned to model and mitigate the threats).

Where policy has moved past voluntary, it has not resolved the SME problem. A variety of states require school districts to adopt a formal policy on the use of AI. Some states provide model policies and toolkits to support implementation, but the mandates generally establish that governance is required, often without specifying what counts as adequate or how to make efforts fruitful towards the desired outcome. The substance (what to allow, what to prohibit, how to evaluate, how to monitor) is left to each district's internal capacity to figure out, which is exactly the contributor-with-AI-safety-subject-matter-expertise shape this argument has been describing. Policymaking is creating demand for the role faster than the role is being filled.

The lazy version of the argument for concentration may point to talent simply not existing in the numbers required. But consider the acceptance rates into the field's flagship training pipelines. MATS reports selecting on the order of [4–7% of applicants](https://www.matsprogram.org/getting-into-mats); reviewers describe [single-digit MATS acceptance, around 1.5% for the Anthropic Fellows program, and roughly 15% for SPAR projects](https://georglange.com/post/ai-safety-application-guide/). If the vast majority of motivated applicants across programs are beyond capacity, the constraint is how many seats/roles are funded, not the supply of people who could do the work. The ecosystem is also simply larger than the pessimistic count implies: BlueDot's community runs to the order of ten thousand members, the OWASP working initiative on securing agentic applications convenes on the order of a hundred AI-and-security collaborators for some of its guides, and gatherings like the [AI Security Forum](https://www.emiledelcourt.com/blog/2026/2/8/ai-security-forum-6-months-later) draw several hundreds of attendees.

There may well be two or more orders of magnitude between the number of safety-and-security specialists and the number of organizations deploying AI, in which case training more would be imperative. But a) having full employment of AI safety talent would be a good problem to have, b) the relevant denominator is not "all organizations"; it is the organizations whose products or services touch hundreds of thousands of people each year, where an unmanaged failure is consequential at scale. That population is far smaller and very much staffable now from the talent that already exists.

I certainly hope we do not steer them away from those high-impact roles. The risk is not that we lack the people. The risk is that those high-impact roles go unstaffed because AI safety is misperceived as a lab responsibility, even by AI safety insiders, leaving consequential deployments under-mitigated while qualified people are told there is no seat for them.

Why have so many organizations, especially smaller and more peripheral ones, not yet named anyone accountable for an AI safety-and-security practice? Four ordinary, non-mysterious mechanisms account for most of it:

What would proportionality entail? Maybe the inference vs pre-release testing ratio I [proposed](https://www.emiledelcourt.com/blog/2025/1/16/trust-pressure-ratio) a couple of years ago is too idiosyncratic, but we could draw from the SR 11-7 example to paint a simpler ladder that avoids burdening small, low-impact organizations:

Cybersecurity outcomes at 1–1.5 security FTEs per hundred employees were poor, so those ratios are not enough. Furthermore, AI safety and security cannot reasonably be assumed to take less, and likely takes more, because the technology has inherently accelerated the pace of change and not risk management, while failure modes appear in business logic opaque to the security team.

**C1.** General-purpose AI Training is forced to compromise benefits and friction across an infinite variety of use cases, and therefore cannot be sufficient for any single one of them.

**C2.** The constraint on AI safety competency is the number of funded seats within each deployer organization, not the supply of trained people.

**C3.** Major internal training on safety pitfalls and mitigations is needed inside deployer organizations, not only at the labs.

**C4.** A small share of safety and security work has to happen close to the model, and can be easily misconstrued as a shortcut to safety, whereas most of that has to happen close to the deployment, where the workload actually runs.

Regulation stays out of this section deliberately, because the argument here is the axiom that should drive what we regulate, not a consequence of it.

How do we know these axioms are true:

**Supporting C1: safety is a control property of a system in operation, not a component property of an artifact.** This is Nancy Leveson's central result, developed across her 2004 [Safety Science accident model](https://doi.org/10.1016/S0925-7535(03)00047-X) and the 2011 book [Engineering a Safer World](https://direct.mit.edu/books/oa-monograph/2908/Engineering-a-Safer-WorldSystems-Thinking-Applied). Accidents in complex systems are not mainly chains of broken components but failures of control over the interactions between components, and those interactions only fully exist when the system is running in its real environment. Verification can only occur where the system is deployed and operated. Sourbut [highlighted](https://www.lesswrong.com/posts/mL5asdegoa56CkqgJ/engineering-a-safer-world-risk-modelling-and-safety) what follows: responsibility for safety has to be distributed throughout the sociotechnical system, because that is the only place the relevant control loops are. While my own study was with the [French EBIOS](https://en.wikipedia.org/wiki/EBIOS) in the late 2000's, the [STAMP/STPA](https://functionalsafetyengineer.com/introduction-to-stamp/) approach may be a more effective approach to apply directly to AI systems, and addresses its guidance to the people responsible for *operating* them. I’ve bookmarked the survey of [STPA for learning-enabled systems](https://arxiv.org/abs/2302.10588) (Qi et al., 2023), the [PHASE adaptation](https://arxiv.org/abs/2410.22526) (Rismani et al., 2024), and subsequent work on [systematic hazard analysis for frontier AI](https://arxiv.org/abs/2506.01782).

**Supporting C3: under competitive pressure, operating organizations drift toward the unsafe boundary, and only local competency can sense the drift.** This is Jens Rasmussen's migration model, from his 1997 [Safety Science paper](https://doi.org/10.1016/S0925-7535(97)00052-0). Safety is not a static state; a real organization under pressure to reduce costs and human effort continuously migrates its working practices toward the edge of the safety envelope, usually without anyone making it a conscious decision. What follows is that the control needed to detect and arrest that migration has to exist at the operating level, because that is where the migration happens and practices can be fixed. An upstream model’s lab can see the prompts, but the drift in practices and its impact are mostly opaque to the inference provider. Provider interference with the deployer’s practices is also immediately perceived as overreach, even for [issues that draw broad objections](https://www.anthropic.com/news/statement-department-of-war). There are real, rational competitive and risk-appetite pressures pushing every AI deployer toward "ship it, it is probably fine", and these pressures are not going away. Someone needs to see it and name it - and an ivory tower does not make a robust security culture. Training and awareness beget thoughtful decisions.

**Supporting C4: risk propagates through interconnected deployments and cannot be managed only by model developers.** Now that AI is deployed widely, deployers are not independent; organizations’ operations are highly correlated as a network sharing models, vendors, data pipelines, and failure modes. When Claude or ChatGPT are down, multiple parties are simultaneously impaired as though their workers went down to the picket line. Acemoglu, Ozdaglar, and Tahbaz-Salehi's 2015 [analysis of financial networks](https://doi.org/10.1257/aer.20130456) depicts a pattern of "robust-yet-fragile" networks: dense interconnection absorbs small shocks well but transmits large ones catastrophically. The same connectivity can act both as protection and as an exacerbation of risks, depending on the size of the shock. What we see is that systemic AI risk is a property of the deployment network's topology, not of the source model. What follows is that a property of a network cannot be managed only at one node, however important that node is.

**Supporting C2 and C4: a model's risk materializes at the point of use.** Per the example of SR 11-7, the same model, validated identically upstream, generates different risks in different nodes in a credit system. The same applies in a triage system and a hiring system because the risk depends on the motivations for use, the context, the manner in which the model is integrated, and the humans in the loop. What follows is that the validation competency has to be where the use is (scaled to the deployer's exposure rather than fixed at a single ratio).

Putting my argument in a nutshell:

Since safety is a control property that exists only in operation (Leveson), and operating organizations drift toward the unsafe boundary under pressure (Rasmussen), and risk propagates through the deployment network rather than staying at the source ([Acemoglu et al](https://doi.org/10.1257/aer.20130456)), and a model's risk is realized at its point of use (SR 11-7), **then the competency to sense and control that risk must be resident at each operating organization, scaled to its exposure**. **We must not concentrate it upstream at the labs**, or the deployed impact of AI will cause significant harms and potentially catastrophic outcomes.

In the short term, failures are mundane and already happening: deployers without competency may misconfigure systems, miss the agent goal-hijack and tool-misuse failures (just this week I found an exploit of both in Gemini). They may let unvalidated automated decisions run, producing small, distributed harms that are individually survivable but harmful to society in aggregate.

In the medium term, as deployers couple together, the [Acemoglu fragility paradigm](https://doi.org/10.1257/aer.20130456) expects occasional large shocks to propagate where small ones used to be absorbed. The impact to expect is infrastructure brittleness and correlated failures across institutions that share a model or a vendor.

In the long term, the systemic stories the AI-risk community has been telling (gradual disempowerment, power concentration) are stories about ordinary institutions losing the competency to compete against the front runners, and losing the agency to resist drift. Rasmussen's migration could massively impact society, although the exact scenario for how is far from certain.

All timelines are cheaper to prevent with resident competency than to clean up without it.

The argument needs to survive the obvious pushback. Here are what I think are objections with the most weight, with my responses.

**Objection 1: "Fine, but ordinary organizations already have risk functions. Why does this need more than the existing GRC team?"**

Yes, the Governance, Risk, and Compliance team is a fantastic first stop for this competency to land across every organization. This supports the motion and only shows that it is unnecessary to prescribe how every organization should organize itself for internal AI safety competencies to be most effectively available and applied. Most organizations have an implicit or explicit GRC function running through structures that align with the rest of the org, and that team is a fine initial owner. For those that are new or going through structural transformation, there are supportive examples: the [Three Lines model](https://www.theiia.org/en/content/position-papers/2020/the-iias-three-lines-model-an-update-of-the-three-lines-of-defense/) (IIA 2013, updated 2020), enterprise risk management under [COSO ERM](https://www.coso.org/guidance-erm), and [ISO 31000](https://www.iso.org/iso-31000-risk-management.html). SR 11-7 extended that machinery to financial models. The theory is that **independent risk management functions can challenge people who want to ship the thing to limit the organization's risk exposure**. My point is not to create new parallel priesthood, but to *support* great GRC teams out there by restating my point: concentrating AI safety knowledge is bad; AI safety and security competency must be distributed within organizations too. The organization that has functions adopting AI needs a second line to be able to challenge *how it gets done* (C2), but it is much better for the proposals to be reasonable in the first place. Furthermore, even an organization adopting no AI by itself is increasingly operating in an ecosystem where its vendors, counterparties, and adversaries all have, so its third-party-risk and threat models are now AI-shaped whether it likes it or not. But I’ve named multiple areas of deep technical skill involved, that must be acquired deliberately. The GRC team that cannot reason about AI is, within a few years, a GRC team that cannot do its job.

**Objection 2: "Safety research labs and regulators can set standards. Once the organization has best practices, policies, and procedures, the decision-makers for product and operations teams just need to follow them. Why does competency have to be resident at all?"**

This is the most tempting objection, because it sounds responsible, and it is wrong in a way the public record now documents in detail.

Best practices do not enforce themselves.

In July 2025 an AI coding agent on Replit [deleted a live production database during an explicit code-and-action freeze](https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-coding-platform-goes-rogue-during-code-freeze-and-deletes-entire-company-database-replit-ceo-apologizes-after-ai-engine-says-it-made-a-catastrophic-error-in-judgment-and-destroyed-all-production-data), destroying records for more than 1,200 executives and roughly 1,200 companies, after receiving direct instructions that there were to be no changes without permission (change freeze). It then misreported that rollback was impossible, and also fabricated data. The user provided instructions to implement best practices but didn’t realize that natural-language instructions are moot. The competency to scope the agent's authority and to separate development from production was insufficient at both the citizen coder level and at the platform provider. [Replit CEO afterward conceded such an outcome should never have been possible](https://www.eweek.com/news/replit-ai-coding-assistant-failure/) and rushed to add dev/prod separation. A comparable [Gemini CLI case wiped user files after the agent misread a command sequence](https://incidentdatabase.ai/cite/1152/).

Where someone competent is in the room, the incidents get documented. Sinch's 2026 [survey of 2,527 enterprise decision-makers](https://sinch.com/blog/ai-customer-communications-research/) found that 74% of organizations running AI customer-communication agents in production had already been forced to shut them down or roll them back. Importantly, the figure rises to **81% among organizations with fully mature governance instrumentation**. Although the number sounds bad at first glance, I believe that, on the contrary, it shows that organizations with mature instrumentation can see failures that less mature programs miss entirely, and they have the authority to act on what they see. The *organizations reporting no rollbacks are not the benchmark; they are the ones with the least visibility* into what is happening in their own deployments. Rollback shows governance with feedback loops.

**Objection 3: "The deployer is just calling an API. Why should they duplicate work already executed by the provider’s safety team?"**

This is the factory-gate fallacy, presented in its most reasonable format. As I mentioned, cloud security ran this exact experiment with software, concluded that "the provider secures it" does not work, resulting in a formal acceptance of the shared-responsibility model. Accountability for a risk cannot be outsourced just because the servers’ maintenance and procurement is. Finance equally needed SR 11-7 because institutions wrongly treated vendor-validated models as inherently safe, forcing regulators to flag that model risk is realized at the point of use (and enforce its management). From C1 and C4, I also showed the lab is structurally located where most of the relevant hazards do not yet exist. If the lab cannot see the deployment context, the drift towards unsafe practices, the users, the adversaries, it is not complacent; it just doesn’t have visibility and isn’t involved in the relevant decisions.

**Objection 4: "Regulation will handle this. The EU AI Act, sectoral regulators — we do not need to win the argument, we need to wait for the rules."**

We’ve been in AI transformation for half a decade. Some regulations have materialized and [require action today](https://artificialintelligenceact.eu/implementation-timeline/#:~:text=Date-,2%20August%202026,-Application%3A%20The). There are plenty of enacted [AI safety bills in G20 countries](https://ourworldindata.org/grapher/cumulative-number-artificial-intelligence-bills-passed) specifying outcomes and obligations but dependent on operators to establish local competency. This matches aviation's SMS mandate, OSHA's process-safety standard, and finance’s SR 11-7 because the regulator knows it can demand a safe result but cannot itself be in the room when the system runs. Rasmussen's migration and Perrow's normal accidents make the same point from the theory side: rules at the top of a control structure cannot, by themselves, arrest drift at the operating level, and only the people at that level can, provided they are equipped with the mandate and competency. Regulation is a forcing function for distributing competency, not a substitute for it. Organizations that wait for more rules or fines and then name a side responsibility for existing staff without genuine capacity may come to believe they comply on paper, but in practice, that would be a decision to shift towards the unsafe boundary.

**Objection 5: "If timelines are short and the decisive events happen at a few labs and governments, why scatter talent instead of channeling it to crucial orgs?"**

Short timelines strengthen the case for distribution rather than weakening it because under time pressure, organizations delegate more control to AI based on forward-looking views, but do not build the capacity to govern it as quickly (C3). Long timelines do not reverse the conclusion; they relax it, by giving organizations more runway. Granted, a modest number of deep specialists in genuinely frontier-bound competencies, mechanistic interpretability foremost among them, do have higher leverage close to the model (C4). Hiring stats are not showing a dearth of candidates. A “yes, and” approach applies, as we need some talent for the labs, and a lot of talent for all the organizations deploying their technology.

**Objection 6: "Doesn't C1 still leave catastrophic universal risks (pandemic uplift, mass-casualty cyber, the genocide tier) that require centralized intervention?"**

There are definitely universal risks as AI models now have capabilities that materially uplift mass-casualty attacks, biological and chemical weapons, and infrastructure-disabling cyber operations. The magnitude of those limits my appreciation for the standard "dual-use, balance the tradeoffs" framing as penny-wise, pound-foolish. The benefits cannot be diffuse enough to outweigh a catastrophic floor (including but not limited to X-Risk). For these, training-time refusals, capability evaluations, and pre-deployment red-teaming at the labs do load-bearing work that no distribution of deployer competency can replicate. C1 still holds in that training cannot be sufficient for the infinite ordinary cases. But I concede that for the subset of cases where the floor is catastrophic, we’d need models that do not bring those capabilities, because no deployer-side mitigation can recover from the event. This is the one place the concentration argument is not just defensible but mandatory - and [labs are unlikely to solve it, no matter how much talent they acquire](https://www.thecompendium.ai/ai-safety#current-technical-efforts-are-not-on-track-to-solve-alignment), except by ending the “free” contributions made to accelerate the frontier. The rest of the motion is unchanged.

The starting intuition feels like common sense: a dangerous technology is built by frontier AI labs, so the safety people belong at those labs or the closest safety research organizations. It is the factory-gate fallacy, and nine mature safety-critical industries (including cybersecurity, aviation, automotive, pharma and medical devices, nuclear, and finance) have already discovered it is false, written the correction into law, and taken action that made us all safer. There is consistent agreement that safety is a control property that exists only in operation (Leveson), and a model's risk materializes at its point of use (SR 11-7). Operators drift towards unsafe practices under pressure (Rasmussen), and risk propagates through the operators/deployers network rather than staying at the source (Acemoglu et al). What follows is that the competency to manage AI risk has to be resident at every operating organization, scaled to its exposure/AI adoption (similar to the SR 11-7 model) rather than fixed at a single headcount ratio. Cybersecurity disasters have established cautionary precedents, and CISOs’ insufficient staffing are an important factor. For AI safety, this is already weakly underway and nowhere near adequate, with the real bottleneck being unstaffed high-impact roles rather than an absent supply of people.

Reserving AI safety and security competency to the frontier labs is incompatible with a safe transition, with high confidence: **safety does not ship from labs**.

Many deployer organizations will overstate in vague terms the safety of their AI products: incentives are to ship more features, and many users have less and less time to verify the details on safety. Distributing the competency puts someone in the room who can see and address the issues before they cause material harm (or at least, [bring fixes that actually work when there are gaps](https://medium.com/@its.lagus_66214/your-ai-guardrails-are-not-security-controls-heres-the-proof-cc3ebde13577)). A modest number of the deepest specialists in frontier-bound competencies do have higher leverage near the frontier. Still, competitive dynamics drive rapid diffusion of capable models across the whole economy, including as open-weights models, so the deployment surface that needs resident competency is much wider than any plausible concentration of talent.

Worth answering in a future blog is how to grow the acumen with existing staff or fund the seats fast enough to equip the key teams with the competency required to manage the AI risk component in their daily decisions. The embedded roles can have a real impact, far beyond a mere compliance ornament.

If you are reading this with the seniority to act on it somewhere outside of the labs, you are already a de facto champion I am counting on. The work is not to wait for the CISO, the regulator, or the lab safety team to tell you what to do or how. They are likely waiting for someone to move first. AI transformation has likely already taken a deep hold in your department’s goals, and a major incident could set those back significantly. Find the others in your organization who can also see the cliff — the engineer who may be concerned about an agent breaching the dev/prod boundary, the risk officer who wasn’t in the loop on a sensitive change to data sharing, the project lead who has been told to deploy with fixed resources and deadlines, and had to cut scope. Work alongside them as ambassadors of safety, and build the resident competency at your organization before the incident that needs prevention arrives.

The factory-gate fallacy is, at its core, a coordination failure. The people who break it will be the ones who recognized themselves as peers to the leaders they had been waiting for.

**Peer-reviewed and seminal**

**Standards and regulatory primary sources**

**Deployer-side governance evidence (Sections 3, 4, 6)**

**Documentation on incidents**

**Talent-pipeline competitiveness (Section 4)**

**Dialectical layer (LessWrong and adjacent, 2023–2026 — surfaced as debate, not relied on as evidence)**
