Can the Safety Tax Be Highly Concentrated?

wpnews.pro

cd /news/ai-safety/can-the-safety-tax-be-highly-concent… · home › topics › ai-safety › article

[ARTICLE · art-28449] src=lesswrong.com ↗ pub=2026-06-15T18:48Z topic=ai-safety verified=true sentiment=· neutral

Can the Safety Tax Be Highly Concentrated?

AI safety researcher argues that expensive safety measures can be applied selectively to the <1% of tasks carrying catastrophic risk, making the alignment tax economically viable. The blended overhead of a 100× tax on 0.1% of actions would be roughly 10%, allowing labs to deploy costly but highly safe systems without being undercut by competitors.

read2 min views16 publishedJun 15, 2026

TLDR: We may capture much or most of the available AI safety benefit by reserving expensive, specialized agents for the <1% of tasks that carry catastrophic risk. This would mean that AI safety work on high-cost but highly safe systems could be very useful.

The standard objection to compute-heavy AI safety measures is competitive: any lab paying a large alignment tax gets undercut by one that doesn't, so expensive safety doesn't survive the market.

This objection typically assumes the tax is paid uniformly - levied on every action regardless of what that action is doing. Drop that assumption and the objection loses most of its force. If the expensive treatment can be applied selectively, to the small fraction of actions where catastrophic consequences live, the blended overhead is small even when the per-action multiplier is enormous. A 100× tax on 0.1% of actions is roughly a 10% tax on the system.

How would one spend 100× on additional safety? The easiest answer is inference-time safety solutions. Redwood’s control research has studied several. Certain situations could also call for different LLMs, perhaps ones optimized for transparency, robustness, neutrality, or that have simply been more heavily vetted.

Could we identify and isolate the top n% of tasks? There are two clear solutions. The first is to get good at understanding which AI tasks are most critical. I'd assume there are some fairly obvious moves to start with. Frontier LLM development and deployment tasks seem critical to get right; a major corporate database migration is clearly more critical than a local one. If you simply charged 100× for more robust LLM agents today, I'd expect them to get used for some of the most important cases by default.

The second is to actively design processes so the critical work is contained to a narrow amount of computation. Companies already do this in many settings: it's risky to hand direct bank-account access to every employee, so access is restricted to a few trusted ones and everyone else goes through a request process. The same shape applies to high-consequence AI actions.

Of course, a different solution is to use the ~100x costly agents training signals in post-training. This probably asks for agents with very different properties than ones optimized for general direct consumer use, but the big-picture economic justifications might be similar.

In principle, highly costly but reliable system use could be formalized. Certain AI development decisions might be deemed sensitive enough that they can only be carried out by a specific set of expensive, vetted (perhaps government-approved) AI agents. There are ways this goes poorly, but also versions that look like a reasonable extension of current practice.

Given all of this, I think that:

Objections

*This post was improved with Claude Opus. Opus provided high-level feedback, helped find the links, and made a bunch of wording adjustments. *

source & further reading

lesswrong.com — original article Who do LLMs self-identify as? Claude also hacked external companies during cyber evals Opportunity to try drafting an international AI treaty

~/api · this article 200

$curl api.wpnews.pro/v1/news/can-the-safety-tax-be-hi…

Read original on lesswrong.com → www.lesswrong.com/posts/ZYe5qNMGqaBHKEW2H/can-th…

mentioned entities