{"slug": "confidence-is-enough-to-decide-it-s-not-enough-to-do", "title": "Confidence is enough to decide. It's not enough to do.", "summary": "A developer building an email triage system argues that classifier confidence alone is insufficient for irreversible actions like sending or permanently deleting emails. The system uses a deterministic payload hash to verify that the approved bytes match the executed bytes, preventing agent-vs-ABI mismatches where natural language descriptions drift from actual actions.", "body_md": "A classifier confidence of 0.99 is enough to decide a tier. It is not enough to send an email you can't unsend.\n\nThose are two different bars, and most \"autonomous\" systems use the first one to clear the second. That's the bug.\n\nThis is the third post in a series that started as a cheap-model brag and turned into an architecture argument. [Post one](https://dev.to/k08200/i-let-gpt-4o-and-a-cheaper-model-fight-over-my-inbox-gpt-4o-lost-fkj): a cheap model beat GPT-4o on email triage. [Post two](https://dev.to/k08200/i-dont-trust-the-llm-to-classify-my-email-so-i-dont-let-it-55d9): the model only scores four features, and a deterministic rule picks the tier. A commenter, [@hannune](https://dev.to/hannune), pointed at one of those four features:\n\nYour reversibility signal is something I have not seen named explicitly before but it is exactly the right axis for anything that touches irreversible state.\n\nHe's right, and it's the cleanest way into the last piece of the design. So: what `reversibility`\n\nactually routes.\n\nMost of what a mail agent does is reversible. Archive, un-archive. Trash, restore. Apply a label, remove it. Mark read, mark unread. Re-tier. Snooze. Every one of those is a single click away from undone, so every one of those rides on exactly what post two described — classifier confidence plus a hash of the input bytes that drove the decision. If the model's confident and the inputs are pinned, ship it.\n\nThree actions are not like the others:\n\n``` js\nexport const FLOOR_ACTIONS = [\"send_email\", \"delete_permanent\", \"forward_external\"] as const;\n```\n\nSend (Gmail's undo-send window is 30 seconds, then it's gone). Permanent delete (skips Trash, no recovery path). Forward to an external party (same network effect as send — it's out). For these, `reversibility`\n\nscores near zero, and near-zero reversibility is the signal that says: confidence is necessary but no longer sufficient. You need something the probabilistic layer can't give you.\n\nThe failure mode here has a name I borrowed from people doing this in crypto: **agent-vs-ABI mismatch**. The agent narrates a high-level intent — \"I sent a polite follow-up to Alice\" — and the thing that actually executed did something the narration glossed over: wrong recipient, an edited body, a different attachment. The agent isn't lying. Natural language is lossy by definition; the description and the bytes are allowed to drift.\n\nThe cure isn't to verify the narration harder. It's to stop signing on the narration and sign on the deterministic artifact — the actual bytes that will travel to Gmail.\n\n```\nexport function sendEmailPayloadHash(input: { to: string; subject: string; body: string }): string {\n  const canonical = {\n    v: RECEIPT_SCHEMA_VERSION,\n    action: \"send_email\" as const,\n    to: input.to.normalize(\"NFC\").trim().toLowerCase(),\n    subject: input.subject.normalize(\"NFC\"),\n    body: input.body.normalize(\"NFC\"),\n  };\n  return crypto.createHash(\"sha256\").update(JSON.stringify(canonical)).digest(\"hex\");\n}\n```\n\nWhen you approve a send, the system mints an `ActionReceipt`\n\nthat pins this hash — the bytes you actually approved, normalized so a cosmetic edit (`Alice@Example.com`\n\nvs `alice@example.com`\n\n) doesn't false-alarm, and NFC-normalized so composed/decomposed Unicode hashes identically (this matters the moment a body has Korean in it). At execute time it recomputes the hash from the about-to-send bytes and checks:\n\n```\nexport function verifyReceipt(receipt: ActionReceipt, expected: { action: FloorAction; currentPayloadHash: string }): void {\n  if (receipt.v !== RECEIPT_SCHEMA_VERSION) throw new ActionReceiptSchemaError(receipt);\n  if (receipt.action !== expected.action) throw new ActionReceiptMismatchError(receipt, expected.currentPayloadHash);\n  if (receipt.payloadHash !== expected.currentPayloadHash) throw new ActionReceiptMismatchError(receipt, expected.currentPayloadHash);\n}\n```\n\nAny drift between approve and execute throws and the action is refused. Reusing a `send_email`\n\nreceipt to authorize a `delete_permanent`\n\nthrows on the action check. Bumping the schema version deliberately invalidates every pending receipt and forces a re-approve under the new shape. The autonomous path fails closed: no valid receipt, no irreversible action.\n\nThat's the whole point of naming `reversibility`\n\nas a first-class feature instead of folding it into \"risk.\" It's not decoration on the tier decision — it's the axis that decides *which trust model an action even gets*:\n\nTwo layers, and the feature score picks which one applies. The probabilistic layer is allowed to stay probabilistic precisely because the floor catches the cases where probability isn't enough.\n\nFair question: is this actually wired, or just a module with a TODO? It's wired. The receipt is minted at `/approve`\n\nfrom the exact bytes you clicked on, `executeToolCall`\n\nrefuses any floor action that arrives without a verified receipt (`FloorReceiptRequiredError`\n\n), and `send_email`\n\nre-checks the payload hash in its own path before anything leaves.\n\nThe honest edges, because there always are some: of the three floor actions, only `send_email`\n\nis a callable tool today — `delete_permanent`\n\nand `forward_external`\n\naren't wired as tool cases yet, but the central guard already fails them closed, so a future case physically can't ship a receipt-less side effect. And the autonomous agent runs in SUGGEST mode by default — read-only tools plus propose-only, no mutating power until you opt into AUTO, and even then the floor stands in front of the irreversible three. The brake went in before the autonomous engine gets switched on, which is the only order that isn't reckless. I'd rather show you the guard and its TODOs than claim more than the code does.\n\nSeparate \"confident enough to decide\" from \"verified enough to do.\" For anything your system can't undo with one user click, don't trust the model's description of what it's about to do — hash the deterministic artifact at approval, verify it at execution, and fail closed on any drift. Confidence is a fine reason to *decide*. It is never, by itself, a reason to *do* something you can't take back.\n\nThe whole floor is ~210 readable lines in the open, AGPLv3: ** github.com/k08200/klorn** —\n\n`packages/api/src/attention-floor.ts`\n\n. Three posts, one idea: keep the model in the perception layer, and put everything you actually stand behind in code you can read.", "url": "https://wpnews.pro/news/confidence-is-enough-to-decide-it-s-not-enough-to-do", "canonical_source": "https://dev.to/k08200/confidence-is-enough-to-decide-its-not-enough-to-do-8ck", "published_at": "2026-06-25 10:18:46+00:00", "updated_at": "2026-06-25 10:43:31.502659+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "ai-safety", "developer-tools"], "entities": ["GPT-4o", "Gmail", "OpenAI"], "alternates": {"html": "https://wpnews.pro/news/confidence-is-enough-to-decide-it-s-not-enough-to-do", "markdown": "https://wpnews.pro/news/confidence-is-enough-to-decide-it-s-not-enough-to-do.md", "text": "https://wpnews.pro/news/confidence-is-enough-to-decide-it-s-not-enough-to-do.txt", "jsonld": "https://wpnews.pro/news/confidence-is-enough-to-decide-it-s-not-enough-to-do.jsonld"}}