Traces show what your agent did - a decision ledger shows what it was allowed to do

A developer introduces a decision ledger for agent observability that records not just what happened but what was authorized. The ledger uses hash-bound records and a chain structure to enable verifiers to prove that executed actions match authorized decisions, catching failures like dropped entries or swapped outcomes. The approach also enforces a bijection between execution traces and decision records to ensure no tool runs off-ledger.

Agent observability has gotten good at answering what happened : OpenTelemetry spans for each model call and tool execution, structured event logs, replayable traces. If a run misbehaves, you can reconstruct the sequence. But for anything that has to stand up to an incident review or a compliance ask, "what happened" isn't the question. The question is what was authorized : Every one of those passes through a decision point in your agent runtime — a policy callback, a confirmation gate, a per-tool auth check. But traces describe execution ; almost nothing writes down the authority . That's the gap a decision ledger fills. Here's the part that took me a while to get right: a decision ledger that's just "more events" buys you nothing. To be auditable rather than merely verbose, it has to support a verifier that can prove executed == authorized without trusting the agent's own narration. That decomposes into three layers, and each catches a failure the others can't. Each decision and each outcome is a well-formed, canonicalized, hash-bound record. The load-bearing field is on the outcome : it must commit to the decision that authorized it. decision event = { decision id, action ref, principal, auth mode, policy version, decision state, args digest, ts } outcome event = { action ref, decision digest = SHA256 JCS decision event , result digest, terminal state, ts } action ref answers "are these two events about the same intended action?" — make it content-derived e.g. SHA256 JCS {agent id, action type, scope, ts} so any verifier can recompute it from the intent alone, with no shared runtime state. decision digest answers a different question: "did this outcome commit to the exact decision that authorized it?" Keep the two separate — collapsing them loses your ability to catch a swapped outcome a result re-attributed to the wrong decision . Layer 1 can only reason about entries that exist . It cannot see an entry that was never written — and that's the highest-stakes failure for incident response, because a tool call that bypassed the policy path or a crash between authority-grant and ledger-write looks like silence , not a malformed row. Close it by chaining: each entry carries prev digest pointing at the prior ledger head, and each turn/session close records the current ledger head digest . Now the ledger is an append-only chain, and a dropped entry shows up as a broken chain — detectable without trusting the writer. This catches two things Layer 1 can't: allowed , the handler then raises or times out, and no outcome is ever written. Indistinguishable from "allowed and silently succeeded" allowed .⚠️ Concurrency gotcha.If your agent runs tool callsin parallel most frameworks do , a naive prev digest chainforks: two appends both chain to head H , and a fork becomes indistinguishable from a drop. Two fixes —serialize the append single-writer per session: a lock or a monotonic sequence, even while the tools themselves run concurrently , or model the ledger as an explicitDAGwhere each entry records a parentsetand the head is a Merkle root over the closed frontier. Pick one, and make sure the verifier knows which shape it's checking: a linear verifier mustrejectforks; a DAG verifier mustacceptshared parents. The final layer ties the ledger back to the execution trace you already emit. Require a bijection at the action boundary : every executed tool span maps to exactly one allowed decision and exactly one terminal outcome — and vice versa. The trace proves execution happened ; the ledger proves it was authorized ; the bijection between them is the " no tool executes off-ledger " invariant. It's the omission detector that Layer 1's per-entry rules structurally cannot express, because it reasons across two independent systems. Put together, the invariant a verifier can now assert is: Nothing executed unauthorized, and nothing authorized vanished. That's the actual compliance property — and you cannot get it from logging alone, no matter how thorough. Per-entry conformance proves each record is well-formed and bound; the chain proves the set is complete; the bijection proves the set matches reality. The deeper principle is one I keep coming back to: a step that reasons can only ask you to trust it; a step that emits a re-checkable artifact — a content hash, a solver's optimality certificate, a recomputable digest — turns "we logged it" into "anyone can re-run it and get the same answer." Move the factual, state-changing parts of an agent through deterministic tools that leave certificates, and the audit stops being a leap of faith. That re-checkable-certificate idea is what I've been building into OraClaw https://github.com/Whatsonyourmind/oraclaw — deterministic decision tools that return verifiable results — but the three-layer ledger above is framework-agnostic; it's worth wiring into whatever runtime you're on. If you're building agents that will ever face an auditor, the cheapest time to add the ledger is before you need it.