If we have models that are 10x less efficient but completely interpretable [1],this would be a multi-billion dollar industry. If you just needed to train your bio-model 10x longer to [reverse-engineer human bio-markers for dementia], then you can now sell your product. Task-specific models are much, much smaller than general SOTA LLMs, so the hit isn't even that much given increasing compute (medical imaging CNNs are ~10-150M params).
There is no law forcing us to make black boxes take over every job. We can have:
Once you show a better option exists, you create demand for that better option. When the better option improves safety, you have a target for regulation. There are two ways this goes. The first:
Voluntary Better Standards --> Codified in Law In 1987, the American College of Radiology offered voluntary accreditation for mammography imaging quality. Only half applied and only half of those passed; Congress passed the Mammography Quality Standards Act in 1992 requiring accreditation.
Currently, there are lots of AI products in radiology, but all their interpretability techniques are [post hoc saliency maps]. Once it's shown you *can *have a high level of understanding/robustness, it can then be a target for regulation (as well as actually offering a better product that's robust to eg changing hospitals).
Financial incentives alone might enforce this. Whatever company first correctly applies this can lobby for stronger interpretability guarantees (ie regulatory capture, but like good?).
However, there is another way laws get passed.
In 1982, 7 people died from cyanide-laced Tylenol. Within 2 months, the FDA issued tamper-evident packaging regulations. This could pass quickly because it was already possible to make the new packaging.
In the case that financial and political incentives don't align with forcing robust, interpretable AI (and the warning shot doesn't kill us all), a warning shot could make a bill pass if technology to prevent this is already available.
Taking a step back, it would be amazing if the financial incentives pivoted to building robust, narrow models over general black-boxes. The big labs could still be the big players if the most efficient method is to first do large pretraining to search for programs and then distill specific tasks into interpretable models.
Solving mechanistic interpretability could make that vision come true.
. . .
Saaadly we haven't solved interpretability, but the research direction I'm giving a 40% chance of actually solving the full problem is:
I work on tensor networks as a more interpretable architecture, and the first thing people ask is "How good is it?". So I showed tensor-transformers are surprisingly performant with my current best estimate of "15% worse wall-clock time, in the worst case". [2]
Even with all the useful properties of tensor-networks, however, I still haven't solved mech interp. This has two takeaways:
Although tensor-transformers are performant, adding additional interpretability constraints (which we can now principledly define with tensor networks) could decrease the efficiency. Even so, they would be more competitive than SOTA.
Well, more competitive in the competition of robust AI for high stakes settings.
"Interpretable" as in you can cleanly debug problems and make strong robustness claims on.
Even after seeing those numbers, many folks still suggest a project to scale up even more, but that's not in my top-10 most useful projects. Let's do stronger interp first!