Confidence is enough to decide. It's not enough to do.

A developer building an email triage system argues that classifier confidence alone is insufficient for irreversible actions like sending or permanently deleting emails. The system uses a deterministic payload hash to verify that the approved bytes match the executed bytes, preventing agent-vs-ABI mismatches where natural language descriptions drift from actual actions.

A classifier confidence of 0.99 is enough to decide a tier. It is not enough to send an email you can't unsend. Those are two different bars, and most "autonomous" systems use the first one to clear the second. That's the bug. This is the third post in a series that started as a cheap-model brag and turned into an architecture argument. Post one https://dev.to/k08200/i-let-gpt-4o-and-a-cheaper-model-fight-over-my-inbox-gpt-4o-lost-fkj : a cheap model beat GPT-4o on email triage. Post two https://dev.to/k08200/i-dont-trust-the-llm-to-classify-my-email-so-i-dont-let-it-55d9 : the model only scores four features, and a deterministic rule picks the tier. A commenter, @hannune https://dev.to/hannune , pointed at one of those four features: Your reversibility signal is something I have not seen named explicitly before but it is exactly the right axis for anything that touches irreversible state. He's right, and it's the cleanest way into the last piece of the design. So: what reversibility actually routes. Most of what a mail agent does is reversible. Archive, un-archive. Trash, restore. Apply a label, remove it. Mark read, mark unread. Re-tier. Snooze. Every one of those is a single click away from undone, so every one of those rides on exactly what post two described — classifier confidence plus a hash of the input bytes that drove the decision. If the model's confident and the inputs are pinned, ship it. Three actions are not like the others: js export const FLOOR ACTIONS = "send email", "delete permanent", "forward external" as const; Send Gmail's undo-send window is 30 seconds, then it's gone . Permanent delete skips Trash, no recovery path . Forward to an external party same network effect as send — it's out . For these, reversibility scores near zero, and near-zero reversibility is the signal that says: confidence is necessary but no longer sufficient. You need something the probabilistic layer can't give you. The failure mode here has a name I borrowed from people doing this in crypto: agent-vs-ABI mismatch . The agent narrates a high-level intent — "I sent a polite follow-up to Alice" — and the thing that actually executed did something the narration glossed over: wrong recipient, an edited body, a different attachment. The agent isn't lying. Natural language is lossy by definition; the description and the bytes are allowed to drift. The cure isn't to verify the narration harder. It's to stop signing on the narration and sign on the deterministic artifact — the actual bytes that will travel to Gmail. export function sendEmailPayloadHash input: { to: string; subject: string; body: string } : string { const canonical = { v: RECEIPT SCHEMA VERSION, action: "send email" as const, to: input.to.normalize "NFC" .trim .toLowerCase , subject: input.subject.normalize "NFC" , body: input.body.normalize "NFC" , }; return crypto.createHash "sha256" .update JSON.stringify canonical .digest "hex" ; } When you approve a send, the system mints an ActionReceipt that pins this hash — the bytes you actually approved, normalized so a cosmetic edit Alice@Example.com vs alice@example.com doesn't false-alarm, and NFC-normalized so composed/decomposed Unicode hashes identically this matters the moment a body has Korean in it . At execute time it recomputes the hash from the about-to-send bytes and checks: export function verifyReceipt receipt: ActionReceipt, expected: { action: FloorAction; currentPayloadHash: string } : void { if receipt.v == RECEIPT SCHEMA VERSION throw new ActionReceiptSchemaError receipt ; if receipt.action == expected.action throw new ActionReceiptMismatchError receipt, expected.currentPayloadHash ; if receipt.payloadHash == expected.currentPayloadHash throw new ActionReceiptMismatchError receipt, expected.currentPayloadHash ; } Any drift between approve and execute throws and the action is refused. Reusing a send email receipt to authorize a delete permanent throws on the action check. Bumping the schema version deliberately invalidates every pending receipt and forces a re-approve under the new shape. The autonomous path fails closed: no valid receipt, no irreversible action. That's the whole point of naming reversibility as a first-class feature instead of folding it into "risk." It's not decoration on the tier decision — it's the axis that decides which trust model an action even gets : Two layers, and the feature score picks which one applies. The probabilistic layer is allowed to stay probabilistic precisely because the floor catches the cases where probability isn't enough. Fair question: is this actually wired, or just a module with a TODO? It's wired. The receipt is minted at /approve from the exact bytes you clicked on, executeToolCall refuses any floor action that arrives without a verified receipt FloorReceiptRequiredError , and send email re-checks the payload hash in its own path before anything leaves. The honest edges, because there always are some: of the three floor actions, only send email is a callable tool today — delete permanent and forward external aren't wired as tool cases yet, but the central guard already fails them closed, so a future case physically can't ship a receipt-less side effect. And the autonomous agent runs in SUGGEST mode by default — read-only tools plus propose-only, no mutating power until you opt into AUTO, and even then the floor stands in front of the irreversible three. The brake went in before the autonomous engine gets switched on, which is the only order that isn't reckless. I'd rather show you the guard and its TODOs than claim more than the code does. Separate "confident enough to decide" from "verified enough to do." For anything your system can't undo with one user click, don't trust the model's description of what it's about to do — hash the deterministic artifact at approval, verify it at execution, and fail closed on any drift. Confidence is a fine reason to decide . It is never, by itself, a reason to do something you can't take back. The whole floor is ~210 readable lines in the open, AGPLv3: github.com/k08200/klorn — packages/api/src/attention-floor.ts . Three posts, one idea: keep the model in the perception layer, and put everything you actually stand behind in code you can read.