Memo: The Model You Validated No Longer Exists

wpnews.pro

Cloud AI platforms now swap the model underneath your running workload – automatically, by design, and in some cases without your code ever noticing. The EU AI Act says a change the original conformity assessment did not assess makes your system legally a new system. Both of these things are true at once, and the collision has no owner. Enterprises are building validation files around named models while the platforms beneath them are engineered to make those names stop meaning anything.

On 15 May 2026, xAI retired eight Grok models. What makes the retirement notice worth reading is not the list – it is the mechanism. The deprecated model slugs continue to resolve. Requests to the old names are silently redirected to grok-4.3 and billed at grok-4.3's prices. Your code does not break. No error fires. Nothing forces you to notice that the model answering your requests is not the model you tested, tuned, and documented. The continuity is the product feature; the substitution is invisible by design.

xAI is not the outlier – it is the pattern. When GPT-4o's 2024 versions retired at the end of March, Azure auto-upgraded Standard deployments to GPT-5.1, a model that previously had no presence on that SKU at all. AWS Bedrock formalizes the same lifecycle in stages: once a model enters Legacy state, you cannot create new fine-tuning jobs or new provisioned endpoints on it, and the clock to end-of-life is running. Anthropic retired Claude 3 Haiku exactly two years after release. The industry has converged on a roughly twenty-four-month model lifespan, and on machinery that makes the handover from one model to the next as frictionless – which is to say, as unnoticeable – as possible.

For a developer, this is a migration chore. For anyone assembling a regulatory validation file, it is something else entirely: the artifact you are certifying references a model that is engineered to stop existing, and to be replaced without your participation. What the law calls a new system

The EU AI Act anticipated that AI systems change after deployment, and drew a line. Article 3(23) defines a substantial modification as a change made after the system is placed on the market that is not foreseen or planned in the initial conformity assessment and that affects the system's compliance or its intended purpose. Recital 128 explains the consequence with an example that should make any platform architect sit up: whenever a change occurs that may affect compliance – the recital's own illustration is a change of operating system or software architecture – the system should be considered a new AI system, which must undergo a new conformity assessment.

If swapping the operating system under a high-risk AI system makes it legally new, it is difficult to argue that swapping the model – the component that produces every output the system has – does not. The model is not infrastructure under the decision; it is the decision. The Act does provide an escape hatch, and the escape hatch is where the entire problem now lives. Recital 128 exempts changes that continue after deployment – provided those changes were pre-determined by the provider and assessed at the moment of the conformity assessment. The drafters were thinking of continuous-learning systems: a model that adapts within boundaries you defined and tested up front. They were not thinking of what the platforms actually built.

Foreseen, but never assessed

Here is the paradox, stated plainly. A platform's auto-upgrade policy is foreseen in the ordinary sense of the word – it is published in the lifecycle documentation, agreed to in the terms of service, executed on a printed schedule. A vendor will argue, with a straight face, that the substitution was planned from day one.

But the exemption does not turn on whether the mechanism was foreseen. It turns on whether the change was assessed at the moment of the original conformity assessment. And you cannot have assessed GPT-5.1's behavior inside your system at a moment when GPT-5.1 did not exist. The replacement model was not merely unexamined at validation time – it was unexaminable. Its capabilities, failure modes, refusal behavior, and output distribution were unknowable to the people signing the conformity file, because the artifact had not been built.

Foreseen-in-principle and assessed-in-fact have come apart, and the silent-substitution machinery operates entirely inside that gap. Call it what it is: silent substitution – a change that is contractually planned, technically seamless, and regulatorily unassessed, all at once. The platform's documentation says nothing happened that you didn't sign up for. The recital's logic says your system became a new system, and your validation file became a historical document.

The question has no settled answer, and the people closest to the Act admit it. Practitioner analyses of the modification provisions note that it remains genuinely unclear when a modifier becomes a provider in practice, and that the lack of vendor transparency makes thorough conformity assessment difficult to conduct at all. The Commission's GPAI guidelines attempt a bright line for model modifications – a threshold of one-third of the original training compute – which is both hard for downstream actors to measure and beside the point here, because substitution is not modification of a model; it is replacement of one model by another.

Three questions with no owner

Strip the situation to what a regulator, an auditor, or opposing counsel would actually ask, and three questions emerge that currently have no owner.

Was the change foreseen – and by whom? The platform foresaw the schedule. The deployer foresaw, at most, that some upgrade would eventually occur. Neither party foresaw the specific replacement model's behavior, because neither could. A conformity assessment is supposed to be an assessment of something; a clause that says "the assessed thing will be replaced by an unassessed thing on a date of our choosing" is a disclaimer wearing an assessment's clothes.

Was the new configuration ever assessed? Almost certainly not by the deployer, whose evaluation harness – if one exists – was built against the old model, and whose re-validation budget assumed the system would hold still. Not by the platform, which evaluates its models in general but has never seen your system, your data, or your use case. The assessment gap is not a corner case of the upgrade path. It is the upgrade path.

Who is the provider now? Article 25 makes an actor who substantially modifies a high-risk system into its provider, inheriting the obligations. The provision imagines a deliberate downstream actor – someone who chose to repurpose or rework the system. Silent substitution inverts the scenario: the change was made to the deployer, not by them. If the swap is a substantial modification, the Act's own logic suggests the party who performed it – the platform – may have just become the provider of ten thousand customers' high-risk systems simultaneously, without any of the parties intending it. No platform's terms of service contemplate that outcome, which is precisely why the terms are written to make the substitution look like routine maintenance.

What an operator should do while the question is unowned

The compliance deadline that would have forced this issue has just moved – the Digital Omnibus agreement of May 2026 defers Annex III high-risk obligations from August 2026 to December 2027, pending formal adoption expected this summer. That is sixteen months of grace, and the worst available reading of it is "sixteen months before we need to think about this." The deferral happened largely because the standards and assessment tooling were not ready – the EU effectively conceded that the assurance infrastructure does not yet exist. The model-substitution machinery, meanwhile, keeps running on its twenty-four-month cycle. By December 2027, the model in any validation file assembled today will be at or past end-of-life. The first conformity files of the new regime will be born already describing systems that no longer exist – unless deployers change how the files are built. Three moves follow.

First, pin what can be pinned. Most platforms distinguish between auto-upgrading SKUs and provisioned or versioned deployments that are not auto-upgraded – Azure auto-upgrades Standard but not Provisioned; most providers offer dated model snapshots. A high-risk workload belongs on the pinned path, with the cost difference understood as a compliance expense, not an infrastructure preference.

Second, write the substitution into the contract instead of accepting it from the lifecycle docs: notice periods, the right to validate the replacement before cutover, and an explicit allocation of who bears the reassessment obligation when the model changes. The platforms' current terms are silent on the regulatory consequence of their own upgrade machinery, and silence favors whoever drafted the document.

Third, build the re-validation harness now, while it is cheap – a fixed evaluation suite that runs against any candidate model and produces a documented delta. The Act's exemption logic actually points the way: changes that are pre-determined and assessed within a defined boundary can be legitimate. A standing harness is how you convert an unassessable future model into an assessable one on the day it arrives – it is the difference between a substitution you absorb and one you certify.

Bottom line

Two engineering cultures built two incompatible artifacts. The platforms built machinery whose explicit purpose is to make model identity not matter – slugs that keep resolving, deployments that upgrade themselves, lifecycles compressed to two years. The regulators built a framework whose explicit premise is that model identity is the whole game – assess this system, certify this configuration, and if it changes in ways you did not assess, start over. Both are doing exactly what they were designed to do. They cannot both be satisfied by the same deployment, and almost no one operating between them has noticed.

The forward call: before the deferred high-risk regime takes effect in December 2027, the substitution question surfaces in one of two ways – either a notified body or market-surveillance authority issues guidance on whether platform-initiated model swaps constitute substantial modification, or the platforms move first, shipping "compliance-pinned" deployment tiers that freeze model versions for regulated workloads at a premium price. The second is more likely, because it converts the legal ambiguity into a product. When a hyperscaler starts selling model stability as a regulated-industries feature, that will be the market's way of admitting what the lifecycle documentation never says: the substitution was never as silent, or as harmless, as the resolving slug made it look.

Sources: xAI model-retirement notice for the 15 May 2026 retirements, slug redirection to grok-4.3, and billing at the replacement model's rates. Microsoft Foundry/Azure OpenAI lifecycle documentation for the GPT-4o retirement of 31 March 2026, the auto-upgrade of Standard-SKU deployments to GPT-5.1, and the exemption of Provisioned deployments from auto-upgrade. AWS Bedrock model-lifecycle documentation for Legacy-state restrictions on fine-tuning and provisioned endpoints. Anthropic deprecation notices for the Claude 3 Haiku retirement (March 2024 release, April 2026 retirement). EU AI Act Article 3(23) for the definition of substantial modification, Recital 128 for the new-system consequence and the pre-determined-changes exemption, and Article 25 for modifier-becomes-provider; practitioner analysis of modification and provider-status ambiguity from the Future of Life Institute's AI Act analysis ("Modifying AI Under the EU AI Act," 2026), including the GPAI guidelines' one-third-compute threshold. Digital Omnibus on AI: provisional agreement of 7 May 2026 deferring Annex III high-risk obligations to 2 December 2027 (Annex I to August 2028), per Covington, Gibson Dunn, and DLA Piper client analyses; formal adoption expected mid-2026, with 2 August 2026 remaining the active date until publication. Cross-references to prior Signal Memo coverage: defeat devices for benchmarks. The "silent substitution" framing, the foreseen-but-never-assessed reading of Recital 128, and the conformity-layer analysis of platform lifecycle machinery are original to this memo.

source & further reading

signal-memo.com — original article Memo: The Deflection Illusion Memo: The Rounding-Error Layoff Memo: The Same IBM Story Proves Opposite Lessons

Memo: The Model You Validated No Longer Exists

Run your AI side-project on zahid.host