Black Boxes for Low-Stakes, Interpretable AI for High-Stakes

wpnews.pro

cd /news/ai-safety/black-boxes-for-low-stakes-interpret… · home › topics › ai-safety › article

[ARTICLE · art-16638] src=lesswrong.com ↗ pub=2026-05-28T15:34Z topic=ai-safety verified=true sentiment=↑ positive

Black Boxes for Low-Stakes, Interpretable AI for High-Stakes

Interpretable AI models, which are 10 times less efficient than black-box systems, could create a multi-billion dollar industry for high-stakes applications like medical diagnostics. Task-specific models are already much smaller than general large language models, making the efficiency trade-off minimal. Once interpretable models are proven viable, financial incentives and safety regulations could drive their adoption, similar to how the Mammography Quality Standards Act mandated accreditation after voluntary standards failed.

read3 min views10 publishedMay 28, 2026

If we have models that are 10x less efficient but completely interpretable [1],this would be a multi-billion dollar industry. If you just needed to train your bio-model 10x longer to [reverse-engineer human bio-markers for dementia], then you can now sell your product. Task-specific models are much, much smaller than general SOTA LLMs, so the hit isn't even that much given increasing compute (medical imaging CNNs are ~10-150M params).

There is no law forcing us to make black boxes take over every job. We can have:

Once you show a better option exists, you create demand for that better option. When the better option improves safety, you have a target for regulation. There are two ways this goes. The first:

Voluntary Better Standards --> Codified in Law In 1987, the American College of Radiology offered voluntary accreditation for mammography imaging quality. Only half applied and only half of those passed; Congress passed the Mammography Quality Standards Act in 1992 requiring accreditation.

Currently, there are lots of AI products in radiology, but all their interpretability techniques are [post hoc saliency maps]. Once it's shown you *can *have a high level of understanding/robustness, it can then be a target for regulation (as well as actually offering a better product that's robust to eg changing hospitals).

Financial incentives alone might enforce this. Whatever company first correctly applies this can lobby for stronger interpretability guarantees (ie regulatory capture, but like good?).

However, there is another way laws get passed.

In 1982, 7 people died from cyanide-laced Tylenol. Within 2 months, the FDA issued tamper-evident packaging regulations. This could pass quickly because it was already possible to make the new packaging.

In the case that financial and political incentives don't align with forcing robust, interpretable AI (and the warning shot doesn't kill us all), a warning shot could make a bill pass if technology to prevent this is already available.

Taking a step back, it would be amazing if the financial incentives pivoted to building robust, narrow models over general black-boxes. The big labs could still be the big players if the most efficient method is to first do large pretraining to search for programs and then distill specific tasks into interpretable models.

Solving mechanistic interpretability could make that vision come true.

. . .

Saaadly we haven't solved interpretability, but the research direction I'm giving a 40% chance of actually solving the full problem is:

I work on tensor networks as a more interpretable architecture, and the first thing people ask is "How good is it?". So I showed tensor-transformers are surprisingly performant with my current best estimate of "15% worse wall-clock time, in the worst case". [2]

Even with all the useful properties of tensor-networks, however, I still haven't solved mech interp. This has two takeaways:

Although tensor-transformers are performant, adding additional interpretability constraints (which we can now principledly define with tensor networks) could decrease the efficiency. Even so, they would be more competitive than SOTA.

Well, more competitive in the competition of robust AI for high stakes settings.

"Interpretable" as in you can cleanly debug problems and make strong robustness claims on.

Even after seeing those numbers, many folks still suggest a project to scale up even more, but that's not in my top-10 most useful projects. Let's do stronger interp first!

source & further reading

lesswrong.com — original article Independent alignment of language models From wantons to moral agents The Conservation Ethic in AI 2040

~/api · this article 200

$curl api.wpnews.pro/v1/news/black-boxes-for-low-stak…

Read original on lesswrong.com → www.lesswrong.com/posts/AThxSscje9W9ieGfB/black-…

mentioned entities

American College of Radiology

Mammography Quality Standards Act

metadata

slugblack-boxes-for-low-stakes-interpretable-ai-for-high-stakes

topic#ai-safety

secondary4 topics

sentimentpositive

canonicallesswrong.com

navigation

← prevPHLX Semiconductor Index posts b…

next →Thoma Bravo’s $2.5B refinancing …

── more in #ai-safety 4 stories · sorted by recency

sigops.org · 12 Jul · #ai-safety

Slow Software: The Case for High-latency Systems Development

slashdot.org · 12 Jul · #ai-safety

WSJ Reports on 'Hard-line Activists Ramping Up for the War With AI'

voice.baselinemakes.com · 12 Jul · #ai-safety

Dictation and Meeting App Sends to Apple Notes – Free and Local Models

x.com · 12 Jul · #ai-safety

Sam Altman and Elon Musk Argue Over Who’s Running the Bigger Scam

── more on @american college of radiology 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required