# Traces show what your agent did - a decision ledger shows what it was allowed to do

> Source: <https://dev.to/whatsonyourmind/traces-show-what-your-agent-did-a-decision-ledger-shows-what-it-was-allowed-to-do-18b5>
> Published: 2026-06-25 12:11:20+00:00

Agent observability has gotten good at answering **what happened**: OpenTelemetry spans for each model call and tool execution, structured event logs, replayable traces. If a run misbehaves, you can reconstruct the sequence.

But for anything that has to stand up to an incident review or a compliance ask, "what happened" isn't the question. The question is **what was authorized**:

Every one of those passes through a decision point in your agent runtime — a policy callback, a confirmation gate, a per-tool auth check. But traces describe **execution**; almost nothing writes down the **authority**. That's the gap a decision ledger fills.

Here's the part that took me a while to get right: a decision ledger that's just "more events" buys you nothing. To be *auditable* rather than merely verbose, it has to support a verifier that can prove ** executed == authorized** without trusting the agent's own narration. That decomposes into three layers, and each catches a failure the others can't.

Each decision and each outcome is a well-formed, canonicalized, hash-bound record. The load-bearing field is on the *outcome*: it must commit to the decision that authorized it.

```
decision_event = { decision_id, action_ref, principal, auth_mode,
                   policy_version, decision_state, args_digest, ts }

outcome_event  = { action_ref,
                   decision_digest = SHA256(JCS(decision_event)),
                   result_digest, terminal_state, ts }
```

`action_ref`

answers *"are these two events about the same intended action?"* — make it content-derived (e.g. `SHA256(JCS({agent_id, action_type, scope, ts}))`

) so any verifier can recompute it from the intent alone, with no shared runtime state.

`decision_digest`

answers a *different* question: *"did this outcome commit to the exact decision that authorized it?"* Keep the two separate — collapsing them loses your ability to catch a **swapped outcome** (a result re-attributed to the wrong decision).

Layer 1 can only reason about entries that *exist*. It cannot see an entry that was **never written** — and that's the highest-stakes failure for incident response, because a tool call that bypassed the policy path (or a crash between authority-grant and ledger-write) looks like *silence*, not a malformed row.

Close it by chaining: each entry carries `prev_digest`

pointing at the prior ledger head, and each turn/session close records the current `ledger_head_digest`

. Now the ledger is an append-only chain, and a dropped entry shows up as a **broken chain** — detectable without trusting the writer.

This catches two things Layer 1 can't:

`allowed`

, the handler then raises or times out, and no outcome is ever written. Indistinguishable from "allowed and silently succeeded" `allowed`

.⚠️

Concurrency gotcha.If your agent runs tool callsin parallel(most frameworks do), a naive`prev_digest`

chainforks: two appends both chain to head`H`

, and a fork becomes indistinguishable from a drop. Two fixes —serialize the append(single-writer per session: a lock or a monotonic sequence, even while the tools themselves run concurrently), or model the ledger as an explicitDAGwhere each entry records a parentsetand the head is a Merkle root over the closed frontier. Pick one, and make sure the verifier knows which shape it's checking: a linear verifier mustrejectforks; a DAG verifier mustacceptshared parents.

The final layer ties the ledger back to the execution trace you already emit. Require a **bijection at the action boundary**:

every executed tool span maps to exactly one

`allowed`

decision and exactly one terminal outcome — and vice versa.

The trace proves execution *happened*; the ledger proves it was *authorized*; the bijection between them is the "**no tool executes off-ledger**" invariant. It's the omission detector that Layer 1's per-entry rules structurally cannot express, because it reasons across two independent systems.

Put together, the invariant a verifier can now assert is:

Nothing executed unauthorized, and nothing authorized vanished.

That's the actual compliance property — and you cannot get it from logging alone, no matter how thorough. Per-entry conformance proves each record is well-formed and bound; the chain proves the *set* is complete; the bijection proves the set matches reality.

The deeper principle is one I keep coming back to: a step that *reasons* can only ask you to trust it; a step that emits a **re-checkable artifact** — a content hash, a solver's optimality certificate, a recomputable digest — turns "we logged it" into "anyone can re-run it and get the same answer." Move the factual, state-changing parts of an agent through deterministic tools that leave certificates, and the audit stops being a leap of faith.

(That re-checkable-certificate idea is what I've been building into [OraClaw](https://github.com/Whatsonyourmind/oraclaw) — deterministic decision tools that return verifiable results — but the three-layer ledger above is framework-agnostic; it's worth wiring into whatever runtime you're on.)

If you're building agents that will ever face an auditor, the cheapest time to add the ledger is before you need it.
