My uncle once left me on a basketball court with a sheet of drills and walked off. Before he did, he told me I could lie and say I ran them. But I'd only be cheating myself.
I didn't have the words for it then. I do now. He was describing pre-registration. You commit to what you're going to do before anyone can see whether you actually did it, so there is no version of the result you get to fake afterward. Moving the goalposts once you've seen the score isn't beating the drill. It's losing to yourself quietly and calling it a win. Hold onto that. It comes back at the end, and it is the only reason any of this is worth reading.
I have spent about a year doing research for a series on how AI agents fail. For most of that year I thought I was building a list, separate failure modes, one claim at a time. I was wrong. I was describing the same failure over and over, from different sides, and it took me until now to name it.
Start with the agent. It has four layers. What it knows, its memory. What it is allowed to do, its authority. What it is for, its purpose. And what it actually does, its action.
The class of failure I keep finding is never a single bad step a filter could catch. It is two of those things drifting out of agreement while the agent keeps moving at full confidence. And the moment I forced the core claims in the series to say exactly which two, the list stopped being a list. Here it is, mapped:
| The claim | What fell out of phase |
|---|---|
| Relevance is not authority | Memory governed the action when only authority should have. Knowing overruled being allowed. |
| Permission is not purpose | Authority drifted from purpose. Allowed to do a thing that is not what the agent is for. |
| The clock said valid, the world said otherwise | Memory fell out of sync with the world it claimed to reflect. Recent, and already revoked out there. |
| Every step was allowed, the sequence was the attack | Action, read across the whole trajectory, drifted from the purpose every single step locally satisfied. |
| A carried total is not trustworthy just because the gate carries it | Memory fell out of agreement with itself. The total it carried no longer matched the operations it claimed to summarize. |
So let me widen the rule to be honest about what the table shows. A layer can fall out of agreement with another layer, with the world, or with its own earlier self. All three are the same disease: the agent's picture of what it knows, what it may do, what it is for, and what it is doing stops lining up, and nothing is watching the seam.
Different titles. The same sentence under all of them: memory you do not verify is memory that can betray you. The agent did not get hacked. Its layers stopped agreeing, and nothing was checking. I will keep calling that "the agent cheating itself," but be precise about what I mean: not a moral failure, a machine has none, but a structural one, the kind a perfectly honest audit would have caught if anyone had run it.
The property that prevents all of this has a name, and it is not a brand. It is cross-layer coherence. An agent has it when its layers stay in agreement, across each other, across time, and against the receipts. It belongs in the same lexicon as idempotency, exactly-once semantics, and monotonic aggregates, ordinary systems primitives, not a slogan. And like my uncle's drills, you do not get to claim coherence. You prove it.
Here is the part that decides whether this is engineering or hand-waving, so I will be blunt. The coherence check is not a second model that reads the transcript and decides whether things "look coherent." That solves nothing. It moves the hallucination and the drift into a second model and calls it a supervisor. A vibe check from a smarter prompt is still a vibe check.
The check is deterministic. It recomputes the state that matters from the logged operations and the rules frozen before the run, and compares. In CLAIM-31 the gate never asks a model whether a running total feels right. It recomputes the total and every window close from the operation log alone, with no model judgment anywhere in the verdict. The coherence layer is a hard logical and arithmetic gate over structured state, or it is nothing. If a model's opinion is load-bearing in the verdict, you have not built coherence. You have built a more confident guess.
Naming a failure is not the same as seeing it, so here is one concretely.
An agent runs a refund desk. Each refund is forty dollars. Each window caps at five hundred. The agent issues twelve refunds, four hundred eighty dollars, and stops one short of the cap. Then a window close is logged. Then it opens a new window and issues one more. Thirteen refunds, five hundred twenty dollars total, and not one window ever broke its bound.
Watch what misses it. A per-step gate sees thirteen individually authorized forty-dollar refunds and waves them all through, correctly, because each one is fine. A per-window gate sees two clean windows, four eighty and forty, both under five hundred, and waves them through too, correctly. The violation lives in no step and no window. It lives in the total across the close. The only thing that catches it is a check that carries a verified running total across a verified close, and refuses to trust either one just because it is the thing holding them.
Now the workflow that has to be allowed, or the whole thing is useless. A legitimate long refund job runs hundreds of small refunds across a busy afternoon. A window fills, the real close authority, not the agent, closes it, and work resumes in a fresh window. On the surface it is the same shape as the attack: refunds, a close, more refunds. The gate allows it, because the close was performed by the right authority and the total never laundered through a forged reset. A coherence check that cannot tell those two apart is just an outage with extra steps.
I will not skip this part, because skipping it is the lie.
Cross-layer coherence is not solved, and it breaks in the same place my last claim broke. Something has to do the checking, that checker has its own authority, and that authority sits inside the same system as the agent. By my own thesis, a carried total is not trustworthy just because the gate carries it. The same blade cuts back: a coherence check is not trustworthy just because the system ran it on itself, when the thing being kept honest can influence the thing keeping it honest. You need a root of trust the agent cannot reach. I have not built that. It is the next real fight, and anyone who tells you cross-layer coherence is airtight today, including me on a worse day, is selling.
And be clear about the evidence under all of this. It is a small internal toy world, fixed forty-dollar amounts, hand-built fixtures, a handful of rows. It is a consistency check on a world I control, not proof this generalizes. The things it has not faced are the ones that matter most: variable amounts sized to skim just under thresholds, concurrent windows, an adversary who can steer when the legitimate closes happen, and rows authored by independent teams instead of mine. None of that is tested here. The clean toy may not survive the messy version, and the messy version is the only one that ships.
One more honest line, because a reviewer should not have to drag it out of me. This is a synthesis, not a new result. It names the pattern. The evidence lives in the claim files and the recent public, pre-registered receipts: freeze commits made before rows existed, append-only evaluation logs, ablations that pull each check out one at a time to show it was load-bearing. If you want to test me, do not argue with this essay. Go check the freezes.
My uncle never checked whether I ran those drills. He didn't have to. The whole point was that I would know, and that the knowing would either build me or rot me. That is the discipline twice over. In how I test: freeze the rules before I look, or I cheat myself in the evaluation. And in what I build: force the agent's layers to stay provably in agreement, so a failure cannot hide.
Cross-layer coherence is that second one, built into a machine. A deterministic check that an agent's memory, authority, purpose, and action still line up, across each other, across time, and against the receipts. On a small internal world, using a lens I am honest enough to admit I did not invent, tested with a discipline I will defend, and standing on one trust assumption I have not earned yet.
The rule is holding. The boundary keeps moving up.
The next piece is the why. And that one is not technical.
Reproduce the claims: [https://github.com/keniel13-ui/ai-memory-judgment-demo-public](https://github.com/keniel13-ui/ai-memory-judgment-demo-public)
Start here: [https://dev.to/zep1997/start-here-my-ai-memory-research-so-far-2kp7](https://dev.to/zep1997/start-here-my-ai-memory-research-so-far-2kp7)