Human-in-the-Loop Is a Delivery Guarantee, Not a UI Feature Two open-source repositories, agent-governance-plane and claude-code-slack-channel, fixed a critical durability gap in human-in-the-loop interactions by implementing a four-move discipline for exactly-once message delivery. The fix treats human-in-the-loop as a durable message system rather than a UI feature, using idempotency keys and outbox patterns to prevent lost decisions or replies during crashes. Two repos. One missing guarantee. In agent-governance-plane , a human's approval was cryptographically signed with Ed25519 and written to a tamper-evident journal. Solid. Except the only InteractionSource wired into the system was an in-memory test stub. A human could see an Allow/Deny prompt — but the click had no way home. The signed approval was a letter with no mailbox. In claude-code-slack-channel , an agent's reply to a Slack thread was a synchronous tool call. If the process died between "decide to send" and "send," the reply vanished. No turn-terminal flush, no retry, no record that an obligation ever existed. The user just waited. Different repos, different directions of travel — one inbound receive a human's decision , one outbound deliver an agent's reply . Same hole: the part where a human and an agent actually hand work to each other was the part nobody made durable. On 2026-06-07 both repos closed that hole. They shipped the same four-move discipline . And one repo's spec was lifted, by name, from the other's pattern. That is the part worth your attention. Human-in-the-loop gets filed under "product" — a button, a modal, a confirmation dialog. That framing is why it breaks in production. The moment a decision has to survive a crash, an ack-loss, or a dropped socket, you are no longer building UI. You are building a durable message system, and the well-known failure modes apply. The receiver and the reply path are both solving exactly-once delivery , message deduplication , and fail-closed defaults . Strip away Slack and the problem is identical to any outbox or any consumer that must not lose, must not duplicate, and must not silently double-act. Four moves fall out of that framing: Hold those four moves. They recur on both sides. CCSC has an inverted architecture, and the inversion is exactly why durability is hard: a reply is a synchronous tool call, not a turn-terminal event. There is no natural "end of turn" to flush at. That is what the outbox pattern is for: record the obligation durably before attempting the send, so a crash can't lose it. The agent calls a reply tool mid-reasoning and expects delivery to just happen. So the durability machinery has to live behind the tool, invisible to the caller. The reply-delivery contract ADR-002 addendum, "Option A: a safety-net behind the reply tool" is deliberately narrow: single-message text replies only. One obligation equals one message, so the idempotency is exact. Chunked, file, and streaming replies stay best-effort and do not enqueue — zero double-send risk, durability deferred to later beads. Scope discipline is part of the design, not a shortcut. The machinery lives in slack-delivery.ts — a side-effect-free sibling module, deliberately not inline in server.ts , so it is testable without dragging in server module-load side effects. The center of it is deliverReplyDurably : // Record the obligation BEFORE the send move 1 . // Crash here is safe — the poller will find the pending obligation and drain it. async function deliverReplyDurably deps: IdempotentSendDeps, reply: ReplyObligation { const obligation = await deps.recordPending reply ; // UUID id == idempotency key try { const ts = await deps.send obligation ; // one inline attempt await deps.markDelivered obligation.id, ts ; return { status: "delivered", ts }; } catch err { if isTransient err { // Queued — the poller redelivers idempotently. Tell the agent "queued" // so it does NOT retry and double-post move 4: never double-act . return { status: "queued" }; } await deps.markDead obligation.id, err ; // non-retryable → recorded dead throw err; } } The obligation id is a fresh UUID per reply call, and that id is the idempotency key. So when the poller later redelivers the same obligation, it dedups against itself. That is move 2, and it lives in the message metadata: // The single site that stamps delivery metadata onto the outbound message. async function postReply client: WebClient, obligation: ReplyObligation, key: string { return client.chat.postMessage { channel: obligation.channel, thread ts: obligation.thread, text: obligation.text, metadata: { event type: DELIVERY METADATA EVENT TYPE, event payload: { idempotency key: key }, }, } ; } Move 3 is the scan. Before sending, or when redelivering, findDelivered walks the destination thread via conversations.replies with include all metadata: true , looking for a message we posted carrying our delivery event type and a matching idempotency key . A hit returns the existing ts — so an ack-loss redelivery becomes a no-op instead of a duplicate. // A redelivery after ack-loss finds the prior post and returns its ts. No second message. async function findDelivered client: WebClient, channel: string, thread: string, key: string { const res = await client.conversations.replies { channel, ts: thread, include all metadata: true, } ; const hit = res.messages?.find m = m.metadata?.event type === DELIVERY METADATA EVENT TYPE && m.metadata?.event payload?.idempotency key === key, ; return hit?.ts ?? null; } The poller itself PR 228 is a deliveryTimer tick calling supervisor.drainOutbox on an interval — SLACK DELIVERY POLL MS , default 15s, the timer unref 'd so it never holds the process open. A boot-time drain recovers crash-pending obligations on startup, and the timer is cleared on shutdown before the supervisor drain, mirroring the existing idle-reaper exactly. Fail-closed also means degrading gracefully when the outbox itself is unavailable. DurableUnavailableError is thrown before any obligation is recorded — the outbox isn't activated, or there's no lease. The caller catches it and falls back to the prior direct send. Nothing is persisted, nothing needs redelivery, and there is zero regression versus the old path. Durability is additive; its absence degrades gracefully to what shipped before. Why not just retry inline? Because inline retry only survives failures the process is alive to handle. The crash-before-send window — record nothing, send nothing, die — is exactly the window inline retry can't cover. The obligation has to exist on disk before the attempt, or there is nothing to retry from. Why not best-effort fire-and-forget? Because "the reply usually arrives" is not a contract a human can build on. In a HITL loop the reply is the work product. Best-effort means the agent thinks it answered and the human is still waiting — the worst failure, because nobody knows it failed. Why is fail-open the dangerous default for an approval gate? This is the load-bearing one. If a receiver times out and the system fails open , the gated action proceeds without a human decision. An approval gate has to act exactly once on a real human decision; failing open makes it act on no decision at all. The entire reason the gate exists is to stop unapproved actions; failing open deletes the gate precisely when it matters. For anything guarding an action, no-decision must mean deny , not proceed . AGP comes at the same four moves from the inbound side. PR 66 builds the production receiver per spec 033-AT-SPEC — which is explicitly "lifted from the CCSC pattern, completes the HITL round-trip." This is the keystone. What transferred wasn't code: CCSC delivers outbound and AGP receives inbound, so the two share not a single line. What transferred was the discipline — record the obligation, key it, reconcile it, fail closed — restated as a spec for an inbound approval channel. The convergence is not two teams independently reinventing a wheel; one read the other's pattern and applied it to the mirror-image problem. The transport is Socket Mode: an outbound WebSocket from the control plane to Slack. No public ingress — which honors AGP's "no public surface until defensible" P0 decision. You get durability and no inbound attack surface. That combination is the design point. Parsing is a pure function, so it is trivially testable and impossible to make stateful by accident: // parseBlockAction — pure. block actions payload → SlackInteraction, or null for noise. function parseBlockAction payload: SlackPayload : SlackInteraction | null { const action = payload.actions?. 0 ; if action || payload.type == "block actions" return null; // ack-and-ignore return { nonce: action.value, approved: action.action id === "approve", userId: payload.user.id, isBot: Boolean payload.user.is bot , }; } The receiver holds a pending-by-nonce promise Map , acks every envelope first Slack drops you if you're slow , then resolves the awaiting awaitInteraction nonce on a matching click. A resolved Set makes replay detection explicit — a click that arrives for an already-settled nonce is reported as a replay , never acted on a second time: class SocketModeInteractionSource { private pending = new Map