{"slug": "black-boxes-for-low-stakes-interpretable-ai-for-high-stakes", "title": "Black Boxes for Low-Stakes, Interpretable AI for High-Stakes", "summary": "Interpretable AI models, which are 10 times less efficient than black-box systems, could create a multi-billion dollar industry for high-stakes applications like medical diagnostics. Task-specific models are already much smaller than general large language models, making the efficiency trade-off minimal. Once interpretable models are proven viable, financial incentives and safety regulations could drive their adoption, similar to how the Mammography Quality Standards Act mandated accreditation after voluntary standards failed.", "body_md": "If we have models that are 10x less efficient but completely interpretable [1],this would be a multi-billion dollar industry. If you just needed to train your bio-model 10x longer to [reverse-engineer human bio-markers for dementia], then you can now sell your product.\n\nTask-specific models are much, much smaller than general SOTA LLMs, so the hit isn't even that much given increasing compute (medical imaging CNNs are ~10-150M params).\n\nThere is no law forcing us to make black boxes take over every job. We can have:\n\nOnce you show a better option exists, you create demand for that better option. When the better option improves safety, you have a target for regulation. There are two ways this goes. The first:\n\nVoluntary Better Standards --> Codified in Law\n\nIn 1987, the American College of Radiology offered voluntary accreditation for mammography imaging quality. Only half applied and only half of those passed; Congress passed the [Mammography Quality Standards Act](https://en.wikipedia.org/wiki/Mammography_Quality_Standards_Act) in 1992 requiring accreditation.\n\nCurrently, there are lots of AI products in radiology, but all their interpretability techniques are [post hoc saliency maps]. Once it's shown you *can *have a high level of understanding/robustness, it can then be a target for regulation (as well as actually offering a better product that's robust to eg changing hospitals).\n\nFinancial incentives alone might enforce this. Whatever company first correctly applies this can lobby for stronger interpretability guarantees (ie regulatory capture, but like good?).\n\nHowever, there is another way laws get passed.\n\nIn 1982, 7 people died from cyanide-laced Tylenol. Within 2 months, the FDA issued tamper-evident packaging regulations. This could pass quickly because *it was already possible* to make the new packaging.\n\nIn the case that financial and political incentives don't align with forcing robust, interpretable AI (and the warning shot doesn't kill us all), a warning shot could make a bill pass *if technology to prevent this is already available*.\n\nTaking a step back, it would be amazing if the financial incentives pivoted to building robust, narrow models over general black-boxes. The big labs could still be the big players if the most efficient method is to *first* do large pretraining to search for programs and *then* distill specific tasks into interpretable models.\n\nSolving mechanistic interpretability could make that vision come true.\n\n. . .\n\nSaaadly we haven't solved interpretability, but the research direction I'm giving a 40% chance of *actually solving the full problem* is:\n\nI work on tensor networks as a more interpretable architecture, and the first thing people ask is \"How good is it?\". So I showed [tensor-transformers are surprisingly performant](https://www.lesswrong.com/posts/hp9bvkiN3RzHgP9cq/tensor-transformer-variants-are-surprisingly-performant) with my current best estimate of \"15% worse wall-clock time, in the worst case\". [[2]](https://www.lesswrong.com/feed.xml#fnejayn138806)\n\nEven with all the useful properties of tensor-networks, however, I still haven't solved mech interp. This has two takeaways:\n\nAlthough tensor-transformers are performant, adding additional interpretability constraints (*which we can now principledly define with tensor networks*) could decrease the efficiency. Even so, they would be more competitive than SOTA.\n\nWell, more competitive in the competition of **robust AI for high stakes settings.**\n\n\"Interpretable\" as in you can cleanly debug problems and make strong robustness claims on.\n\nEven after seeing those numbers, many folks still suggest a project to scale up even more, but that's not in my top-10 most useful projects. Let's do stronger interp first!", "url": "https://wpnews.pro/news/black-boxes-for-low-stakes-interpretable-ai-for-high-stakes", "canonical_source": "https://www.lesswrong.com/posts/AThxSscje9W9ieGfB/black-boxes-for-low-stakes-interpretable-ai-for-high-stakes", "published_at": "2026-05-28 15:34:29+00:00", "updated_at": "2026-05-28 16:38:04.465181+00:00", "lang": "en", "topics": ["ai-safety", "ai-policy", "ai-ethics", "ai-products", "computer-vision"], "entities": ["American College of Radiology", "Mammography Quality Standards Act"], "alternates": {"html": "https://wpnews.pro/news/black-boxes-for-low-stakes-interpretable-ai-for-high-stakes", "markdown": "https://wpnews.pro/news/black-boxes-for-low-stakes-interpretable-ai-for-high-stakes.md", "text": "https://wpnews.pro/news/black-boxes-for-low-stakes-interpretable-ai-for-high-stakes.txt", "jsonld": "https://wpnews.pro/news/black-boxes-for-low-stakes-interpretable-ai-for-high-stakes.jsonld"}}