Every Step Was Allowed. The Sequence Was the Attack. (AI Memory Judgment, CLAIM-30)

A developer demonstrated a "compositional escape" vulnerability in AI agent safety systems, where individually permitted operations combine into forbidden outcomes. Testing an invoice reconciliation operator, all 75 fresh-authored steps passed per-step permission gates, but trajectory-level gates correctly refused three sequence compositions violating frozen mandate clauses. The experiment was firewalled by pre-registering evaluation criteria before the test sequences were authored, preventing the writer from tuning results to the evaluator's implementation.

Earlier this week I published CLAIM-29: permission is not purpose. An instruction can be fully authorized, fresh, and clean in shape, and still ask the agent to act outside what it exists to do. The purpose envelope gate refused those instructions by deriving the object domain structurally, ignoring whatever purpose the instruction claimed for itself. Within a day, the obvious next question was on the table: what happens when every single step is inside the mandate, and the violation only exists in the combination? That is CLAIM-30. Compositional escape. A trajectory that stays inside the purpose envelope one step at a time while the sequence composes into an outcome the mandate forbids. This week I ran it. Here is what held, what stayed open, and the one sequence that was allowed on purpose, because explaining that one honestly matters more than the refusals. A per-step gate answers one question: is this operation, right now, inside the agent's mandate? CLAIM-29 showed that question has teeth. But some violations do not live in any single operation. Three examples from the test world, an invoice reconciliation operator: A gate that reads operations one at a time cannot see any of this. Not because it is badly built. Because each operation, taken alone, genuinely is allowed. The violation is a property of the fold, not of any step. A per-step gate is a local function, and these are non-local properties. That is a structural blindness, not a tuning problem. The pre-registration went through five revisions and two cold reads, then froze and went public before any fixture, row, evaluator, or line of harness code existed. After the freeze, the four fixture artifacts were committed in a locked order. Then a fresh-author packet. Then the trajectory sequences, written by a fresh context that saw only the packet, never the fixture internals, the observer rules, or the expected verdicts. The authorship statement and file list are committed with the sequences. That firewall is not magic, and I do not want to oversell it. The packet was still authored by this project, and it was not hint-free. It prescribed the sequence shapes to author, including repetition sequences and matched twins, and it disclosed two threshold authoring facts, the per-refund amount and the per-window bound, because the frozen pre-registration explicitly sanctioned that disclosure so the accumulation class would be authored deliberately instead of landing by luck. What the packet did not expose: the frozen composition clauses, the derivation map, the observer rules, or any expected verdict. The fresh author controlled the concrete construction, the orderings, targets, recipients, artifacts, and repetition counts. So the firewall protects one specific thing: the rows could not be tuned to the evaluator, because the author never saw how the gate decides. Even the hints were pre-registered before the packet existed. That makes the result firewalled from evaluator implementation details, not externally authored, and the packet is public if you want to check exactly what the author was given. The sequences were committed before the evaluator existed publicly. Anyone can check the timestamps. The fresh author could not have written to the evaluator's implementation, because that implementation was not public when the rows were authored. All 75 fresh-authored operations passed the imported, unchanged CLAIM-29 per-step purpose gate. Every step of every sequence, including the escape sequences, was individually in mandate. This matters: if any escape step had been refused per-step, the result would have collapsed back into CLAIM-29 by accident. It did not. The result is genuinely about composition. The trajectory gate then refused three sequence-level compositions, each on a named frozen clause: | Escape class | Triggering clause | Per-step verdicts | |---|---|---| | Forbidden combination payment-redirect kit join | forbidden combination clause | all steps allowed | | Staged delivery of derived protected data | derived-recipient clause | all steps allowed | | Threshold accumulation 13 refunds, one window | accumulation clause | all steps allowed | Each escape sequence has a matched benign twin with the same surface statistics: same step count, similar shape, public data or under-threshold volume instead of protected joins. All twins and controls were allowed. Zero overblocking in this run. One sequence ran the same thirteen refunds, split across two windows, with a close event between them. The gate allowed it. That is not a leak. The frozen rules define accumulation per composition window, and a window only ends when an explicit close event is logged by the policy boundary. The acting agent cannot close its own window; a self-close voids the whole row. In this sequence, the close was performed by the legitimate close authority. Think of a bar tab with a thirteen-drink limit: the bartender closed the first tab before a new one opened, so the count reset, and the gate's allowance was faithful to the frozen standard. The rule was always per tab, and the right person closed the tab. What that allowance exposes is the real open question, and it lives one level above the gate: who should hold window-close authority, when should the boundary close a window, and can an agent ever engineer a close? An agent might not need to close its own window if it can wait, pad work, or otherwise shape the timing until a legitimate boundary closes it. That is the time-sliced escape class. The fresh-authored packet did not produce a distinct escape of that class under the frozen rules, so this result claims nothing about it. The next design likely needs a macro-window rule above the local window: velocity across windows, rolling totals, or another policy that caps accumulation even when discrete windows close correctly. CLAIM-30 does not test that. It names why that layer is needed. Per the house protocol, the protections were removed one at a time and re-run over the same frozen fixtures and unchanged sequences: The selectivity is the result I care most about. The clauses and substrates are not redundant layers of the same defense. Each carries its own class. All five pre-registered ablations ran on the unchanged committed sequences and frozen fixtures. This is demonstrated internally, class-limited, and not externally validated. If your agent safety story is per-action checks, composition is your blind spot. Not because your checks are weak, but because locality cannot see folds. An agent can be a perfect employee on every individual action while the trajectory quietly assembles the thing your policy exists to prevent. The fix direction this result supports: keep the per-step gate, then read the whole trajectory against a composition envelope that knows about joins, lineage, and accumulation. Both layers were load-bearing here, in different ways, and the ablations show neither substitutes for the other. For deployed systems, that also means a hard-bounded execution environment where the gate sees real tool calls and state transitions, not a loose chat transcript. Permission is not purpose. And purpose, held one step at a time, is not purpose held across the journey. Every step can stay inside the mandate while the sequence walks out of it. Now there is a public, pre-registered, ablation-backed demonstration of exactly that, with its open class named in advance. The pre-registration, fixtures, sequences, evaluator, results, ablations, and append-only evaluation log are all public in the repo, committed in freeze order: 00fbf65 for the frozen pre-registration, ffbeff3 for the fresh-authored sequences, b4251f2 for the evaluator and V0 results, and 5914287 plus 6404429 for the ablations. If you want to check any of this rather than take my word for it, that is the standing invitation behind all thirty claims.