Generative AI delivers results that no one can follow anymore. AlphaGo showed this pattern in 2016. When is reliability enough, and when is understanding needed?
- Golo Roden
Perhaps you know this from working with generative AI: Something works, and you don't know why. A generated piece of code runs, a suggested solution helps, a diagnosis is correct. The result is verifiable and usable before you; only you couldn't reconstruct the path to it, not even roughly retell it.
For a long time, this seemed like a minor issue to me, just the price of convenience. Now, I consider it the truly interesting question. Because what's shifting here isn't just the speed at which solutions emerge, but the relationship between a result and the understanding we used to acquire with it. It's worth asking whether this loss is a problem – and if so, where exactly.
The working result without the path to it
Traditionally, when you solve a problem yourself, you get two things at once: the solution and the understanding of how it came about. Of these two, understanding is the more durable part. The specific solution applies only to that particular case; the understanding carries over to the next, slightly different case.
With generative AI, these two things fall apart. I get the solution without immediately gaining the understanding. A tricky regular expression, a build configuration that suddenly works, an error that disappears with an accepted suggestion: The result is there, the path remains in the dark.
Videos by heise
This is not the same as using a library or trusting a compiler. In those cases, someone has understood what's happening, and the abstraction is documented, stable, and readable. With a generated result, the derivation itself is essentially opaque, and it varies slightly with each request.
The crucial difference lies in transferability. A result is a single piece; understanding is reusable. Where I lack understanding, I lack the ability to solve the next case on my own. I remain dependent on the tool, not because it's more convenient, but because I could no longer do it myself.
The insidious part is that adopting a result feels like learning. After all, I've accomplished something I couldn't do before. But the impression is deceptive: What has increased is the number of completed tasks, not the depth of what I master myself. Borrowed competence feels like one's own, as long as the tool is at hand.
For a long time, the effort to penetrate a problem was considered a tedious detour on the way to the solution. In reality, it was the path by which understanding was created in the first place. Anyone who has struggled through an error not only knows the solution afterward but also the terrain around it: the dead ends, the false leads, the points where it could still go wrong. Exactly this side knowledge is lost when the finished solution is delivered. So, this friction was not a flaw but part of the yield. Eliminating it sounds like progress, but it costs something that doesn't initially appear on the bill.
A move no one can explain
In March 2016, AlphaGo defeated Go grandmaster Lee Sedol 4:1 in a match. What became famous wasn't the result, but a single move: the 37th move in the second game, known in Go history as “Move 37.” No human would have played it; the probability of a human professional choosing it was about one in ten thousand. It was alien, and it was brilliant.
Lee Sedol himself provided the counterpoint. In the fourth game, he played the 78th move, celebrated in Korea as the “divine move,” equally improbable and his only victory against the machine. The difference isn't in genius, but behind it: Behind Move 78 lies an idea that a human can explain, a clever tactical move called a Tesuji in Go. Behind Move 37 lies a probability distribution over millions of self-played games.
Professional players studied AlphaGo and adopted individual moves; the early invasion at the 3-3 point, for instance, became popular after the machine favored it. AlphaGo Zero, in 2017, learned entirely without human games, solely from playing against itself, and significantly surpassed the original version.
AlphaGo is based on neural networks, and everything it learned during training is embedded in their weights, the many numerical values that make up such a network. That's where the knowledge that makes the machine superior lies, and nowhere else. It's not in any textbook, any commentary, any mind. You can watch the machine play and marvel at the results, but the underlying strategy remains locked in a form that cannot be translated into human terms.
Significantly, not even the experts behind AlphaGo could explain the move in the usual sense. They could describe how the system was trained and show that it assigned a high value to Move 37. But they couldn't deduce why that particular move was the right one. The explanation ended with the mechanics of learning, not the meaning of the result.
However, adopting a move doesn't mean understanding it. You imitate what proves strong without being able to reconstruct the why. The machine plays superhuman Go and leaves behind no teachable theory of it. This is precisely the structure I encounter again with generative AI: a high-level result that can be copied but not inherited as insight.
Can the black box really be opened?
The obvious hope is that the models simply need to be made explainable. Explainable AI is a serious field of research. But the common methods explain in retrospect: They show which inputs were particularly important or which parts of the network were activated. This yields plausible stories and approximations, but not accurate information about what the weights actually calculate.
Worse still: A plausible explanation can be misleading. It sounds convincing, aligns with our expectations, and suggests an understanding that doesn't actually hold. An explanation that doesn't faithfully represent what's happening in the model is, in doubt, more dangerous than none because it creates a false sense of security.
For daily work, there's little consolation to be drawn from this. Highlighting which lines an assistant considered relevant says nothing about whether the generated code is correct. The explanation, at best, describes what the model oriented itself by, not whether the result is correct. Confusing the two is the easiest and, at the same time, most dangerous mistake when using these tools. A newer branch, mechanistic interpretability, attempts to literally reverse engineer a model's internal calculations. This is promising but still in its infancy. With models containing billions of parameters, a complete explanation is far off.
Only simple models are understandable from the ground up. A decision tree is a readable sequence of if-then branches that can be followed step by step. However, as soon as you want to increase accuracy and move to ensembles, to random forests, or Gradient Boosting, the transparency is gone again. Computer scientist Cynthia Rudin has derived from this the demand to use interpretable models from the outset for consequential decisions rather than explaining black boxes afterward.
Behind this lies an uncomfortable connection: the most powerful models are simultaneously the least transparent. Explainability and performance pull in opposite directions. This opacity is of a different kind than the fundamental limits I described in a previous post: there, it's about what language models fundamentally cannot do; here, it's about how little we understand how they do what they do.
Understanding that may no longer exist anywhere
Now, one might object that we already rely on many things we don't understand. I trust the compiler, the TLS library, my car's brakes without understanding their internals. That's true, and it's a good thing. No one can understand everything themselves.
But there's a difference. In these examples, the understanding exists somewhere. It resides with the developers of the compiler in the specification of the protocol, in the design drawing of the brake. In principle, I can follow the chain until I reach someone who can explain why the thing works.
It's the difference between knowing that something works and knowing why it works. The former is sufficient to use a result. The latter is necessary to change it, transfer it to new cases, or stand by it in a dispute. For a long time, both were often closely linked; now they can be decoupled.
Science and technology have functioned for centuries because knowledge took a transferable form: theorems, proofs, construction plans that others could check, teach, and develop further. The knowledge of a large model, on the other hand, lies in billions of numbers that, in themselves, explain nothing. It is knowledge that works without being able to communicate itself. Thus, it lacks precisely the property that makes human knowledge connectable.
With the output of a large language model, the chain can lead to a dead end. No one designed Move 37; no one can point to the reasoning behind it because there is none. The understanding doesn't lie elsewhere; it possibly doesn't exist at all in anyone. This is new, and it's more than an academic difference.
This becomes visible wherever someone has to vouch for a result. A doctor following a diagnosis, a developer delivering a generated component, a team operating a system: they all take responsibility for something whose genesis they don't fully grasp. As long as everything goes well, this isn't noticeable. It becomes noticeable as soon as something goes wrong, and the question of why can no longer be postponed.
For software, this difference becomes concrete. Software is not a one-time result that you accept and set aside. It must be maintained, extended, understood in case of errors, and accounted for over years. That the real bottleneck in development was never writing code, but understanding the problem, I have explained elsewhere. A result that no one understands is, against this background, not a saving but a liability for any future change.
Reliability does not replace understanding
So, is the loss of understanding an issue? The honest answer is: It depends, and the dividing line deserves to be drawn precisely. There are cases where reliability is the completely correct standard and understanding adds nothing.
If a result is a one-off case, cheap to verify, and without major consequences, then I don't need to understand its derivation. I can test a regular expression against my examples, measure a small throwaway script by its result. Insisting on understanding here would be romanticizing; what counts is that the result is demonstrably correct. However, as soon as I am responsible for a system and must develop it further, the standard is reversed. Then, a result I don't understand is not a saving but a deferred bill. Every subsequent change, every error, every adaptation demands precisely the understanding that I skipped the first time. Reliability alone is no longer sufficient where I need not just a result but a foundation upon which I can build further.
Furthermore, reliability itself is a shaky standard as long as understanding is lacking. I can only test what I have thought of. Which cases are even to be tested, which boundary conditions become tricky, where a result might fail: No test reveals this, only the understanding of the matter. Those who don't understand, in doubt, test the wrong thing and still consider the result secure.
In practice, this becomes apparent at the latest when a generated module breaks in production months later, and no one in the team can say what assumptions underlie it. Then the work begins that was supposedly omitted the first time: reading the software, penetrating it, understanding it. Only now it happens under time pressure and without the context that would have been tangible at the time of creation.
This bill cannot be canceled, only shifted. Either I work out the understanding at the beginning, when the context is fresh and the pressure is low, or I catch up later, more expensively and under worse conditions. It never disappears.
Therefore, generative AI does not eliminate the value of understanding. It merely shifts the time when we pay for it. It relocates the problem; it doesn't solve it. The ability that becomes more important instead of less important is judgment: to recognize which of the two cases I am currently in and to decide whether a result deserves trust without being understood.
AlphaGo showed us a decade ago what that feels like: brilliance we admire and copy but cannot inherit as insight. Whether we ultimately just operate what we have built or continue to understand it is not decided by the tools. It is decided by where we continue to insist on understanding.
(mro)