The Concerning, Unchecked Rise of E2E AI in Physical Applications

Tesla's full self-driving software, the first large-scale deployment of end-to-end AI in a physical system, has demonstrated that the probabilistic, data-driven engineering approach carries inherent risks of catastrophic failure that could kill people. The paradigm, which maps raw sensor inputs directly to control actions without human-coded rules, is now being exported to humanoid robots and surgical automation systems despite mounting evidence of its dangers. This unchecked rise of end-to-end AI in physical applications threatens to bypass the deterministic safety standards that governed previous engineering achievements like NASA's Artemis moon mission.

The recent Artemis II moon mission was a monumental engineering achievement—perhaps humankind’s greatest ever—on many levels: It travelled farther from Earth than any other manned spacecraft, it used the most powerful rocket ever built, and it provided a shot in the arm for NASA’s ambitious goal of landing humans on Mars. But it pales in comparison to an even more ambitious engineering goal here on the ground—the frenzied push for developing fully autonomous land vehicles as the first step on the road to a robot-populated future. Artemis represents the apotheosis of the tried-and-true approach of “deterministic” engineering, or rules-driven engineering. The environment is extreme but known, failure modes are well-defined, and orbital mechanics, thermal loads, and rocket thrust are governed by equations that were originally developed hundreds of years ago by the likes of Newton and Kepler. In contrast, the emerging approach is born of recent developments in AI. It could be called “probabilistic” engineering, or end-to-end, data-driven engineering. View All https://www.eetimes.com/category/sponsored-content/ Probabilistic, data-driven engineering learns from examples. Feed a neural network enough data, and it discovers patterns no human engineer would have thought to encode. This is a genuinely powerful and significant development. For perception-heavy tasks in open environments—such as navigating autonomous vehicles on city streets—it is often the only tractable approach. An autonomous system can’t just calculate; it must adapt and generalize in a way that rule sets cannot. You cannot write a rule for every conceivable situation: A child chasing a ball, garbage blown into the road, a faded lane marking in rain, an irate driver behaving irrationally—these scenarios are effectively infinite in variety. Probabilistic systems are powerful but inherently risky. They produce outputs by selecting from an internal distribution of possible output states. But distributions have edges, and at those edges—however rarely encountered—the output of a neural network can be catastrophically wrong. In software, a catastrophic failure corrupts data. In a physical system connected to actuators, it kills people. How we got here: the end-to-end revolution The pursuit of end-to-end AI-based vehicle autonomy was likely catalyzed by a seminal 2016 Nvidia paper, “End-to-End Learning for Self-Driving Cars. https://arxiv.org/pdf/1604.07316 ” The idea was radical in its simplicity: Instead of decomposing driving into separate modular activities—perception, prediction, planning—with carefully hand-crafted interfaces between them, simply train a single neural network to map raw sensor inputs directly to steering commands. Let the network figure out everything in between. The authors’ conclusion: “A small amount of training data from less than a hundred hours of driving was sufficient to train the car to operate in diverse conditions, on highways, local, and residential roads in sunny, cloudy, and rainy conditions.” Tesla, already the world’s largest collector of real-world driving data, applied this paradigm at scale and has refined it through successive generations of full self-driving FSD software. The appeal is irresistible: An end-to-end system trained on millions of hours of human driving data can, in theory, handle situations that no finite rule set could ever anticipate. Today, this direct mapping from raw sensory input to low-level control actions—pixel-to-torque—is a distinct paradigm that has gained significant momentum https://arxiv.org/pdf/2509.20841 from the success of large language and vision-language models and is being exported to humanoid robots and surgical automation systems. The logic is the same: collect enough data, train a large enough model, and the network learns to operate the body or the scalpel. But there is a problem. The evidence is in: Tesla’s FSD—the first end-to-end AI system to be deployed at scale in uncontrolled physical environments—is not working as advertised. It’s not anywhere near achieving the level of autonomy needed for widespread acceptance and ultimately for profits. There is no clearer proof of that than Tesla’s use of remote human drivers who can take charge of Tesla robotaxis in certain cases. “As a redundancy measure in rare cases … remote assistance operators are authorized to temporarily assume direct vehicle control as the final escalation maneuver after all other available intervention actions have been exhausted,” said Karen Steakley https://www.wired.com/story/tesla-says-its-robotaxis-are-sometimes-driven-by-humans/ , Tesla’s director of public policy and business development. That is not autonomy; it is a tacit admission that the neural network’s generalization has limits that can’t yet be trained away. The incident reports pile up. FSD vehicles failing to yield to school buses. Robotaxis driving into flooded streets. Near-misses with emergency vehicles. Former Tesla AI trainers have recently spoken on record to Reuters https://www.reuters.com/investigations/why-teslas-ai-trainers-dont-trust-its-self-driving-tech-or-its-safety-stats-2026-05-28/ about the internal pressure to ship software that engineers privately regarded as unsafe. The regulatory and legal machinery is struggling to keep up with a deployment pace driven not by safety milestones but by investor expectations. Each one of these failures is a natural experiment in what happens when a probabilistic core is connected directly to physical actuators without an adequate deterministic backstop. The experiment is being run on public roads, with real people, without their informed consent. The deterministic shell: What it is and why it matters The solution is not to abandon machine learning. It is to stop deploying it naked. A deterministic shell is a rules-based safety layer that wraps the probabilistic core. It can take several forms. The simplest may take the form of an output filter: a certified module that receives the neural network’s proposed command—steer hard left, accelerate, extend the surgical tool—and checks it against a formal specification of safe behavior before passing it to the actuator. If the proposed command falls outside the safe envelope, the filter blocks it and substitutes a safe default. The filter does not need to understand why the network made a bad decision. It only needs to know what outputs are permissible. A more robust approach could add an independent parallel monitoring system with its own physically separated sensor stack—sensors that share no hardware with those feeding the neural network, so a particular sensor failure cannot corrupt both systems simultaneously. This watchdog would run a formally verified model of safe behavior and could override the primary system if it detects an impending violation. A third layer is graceful degradation: When the monitoring system detects a failure, it does not merely sound an alarm. It executes a safe-stop maneuver autonomously, buying time for human intervention. Yes, this adds cost. It adds complexity, and it is harder to build than a bare end-to-end network. That is precisely why it has not been built—not because it is impossible, but because it is expensive and time-consuming, and the market is not yet demanding it loudly enough. The “bodies” are the market signal that has been suppressed. Harder problems have already been solved Lest anyone argues that wrapping probabilistic AI in deterministic constraints is technically intractable, consider what engineers have already accomplished. The field-effect transistor is a quantum mechanical device. At the level of individual electrons, its behavior is irreducibly probabilistic. Yet the digital logic built on top of billions of transistors is, for practical purposes, perfectly deterministic—because circuit architects designed the system so that quantum-level noise does not propagate to logic-level errors. Deterministic behavior was engineered in. It did not emerge spontaneously. Claude Shannon proved that you could transmit information with arbitrarily low error probability over a noisy, probabilistic channel by adding carefully chosen redundant bits. The noise is not eliminated. It is contained by a deterministic structure wrapped around a probabilistic medium. The intellectual tools to build hybrid architectures—probabilistic learning cores constrained by formally verified safety layers—are already in the engineering canon. The challenge is application, not invention. Elon Musk and his ilk would dismiss such guardrails as unnecessary. Musk’s position is essentially this: more data + bigger model = safety problem solved . The deterministic shell, in his worldview, would be a temporary crutch until the core is capable enough to not need it. But a recent Anthropic blog post disagrees https://alignment.anthropic.com/2026/hot-mess-of-ai/ . Anthropic’s research into large-scale AI deployment has flagged what it describes as the “hot mess” problem: models that perform coherently within their training distribution but degrade unpredictably when inputs shift. Across all tasks and models, the longer that the models spend reasoning and taking actions, the more incoherent their errors become. The harder the problem, the more the model reaches for extended reasoning, and the more incoherent it becomes, concludes the post, directly contradicting Musk’s “bigger is better” theory. The case for a deterministic shell grows stronger. A direct message to engineers This essay ends where it should: with the people who build these systems. Engineers are, in a very real sense, the last technical checkpoint between an untested probabilistic system and the human beings who will interact with it—often without understanding what it is, often without any meaningful ability to opt out. The passenger in the robotaxi did not review the training data. The patient on the surgical table did not audit the inference pipeline. They trusted the system because an engineer signed off on it. The pressure to ship is always intense. The competitive dynamics are brutal. The investors are impatient, and the deadlines are immovable. It is easy to rationalize: The edge cases are rare, the next training run will fix it, the system is already better than a human driver. But “better on average than a human driver” is not a safety standard. It is a marketing claim. A safety standard is a provable bound on worst-case behavior. Engineers should note the difference and build to the standard, not the claim. The tools exist. The precedents exist. The knowledge exists. The transistor engineers did not say “quantum tunneling is probably fine.” Shannon did not say “the channel is usually clean enough.” They built the shell. They did the hard work. They gave us reliable digital systems out of irreducibly noisy physics. What also exists—right now, on roads and in operating rooms and in robot warehouses—is a mass experiment running on unconsenting human subjects. And the error rates are measured in lives. Read also: P hysical AI Pushes Chipmakers Up the Value Chain At TSMC’s European Symposium, senior executives from some of TSMC’s lead customers talked about how AI is changing the shape of their products and businesses.