Traces show what your agent did - a decision ledger shows what it was allowed to do

wpnews.pro

cd /news/ai-agents/traces-show-what-your-agent-did-a-de… · home › topics › ai-agents › article

[ARTICLE · art-39246] src=dev.to ↗ pub=2026-06-25T12:11Z topic=ai-agents verified=true sentiment=· neutral

Traces show what your agent did - a decision ledger shows what it was allowed to do

A developer introduces a decision ledger for agent observability that records not just what happened but what was authorized. The ledger uses hash-bound records and a chain structure to enable verifiers to prove that executed actions match authorized decisions, catching failures like dropped entries or swapped outcomes. The approach also enforces a bijection between execution traces and decision records to ensure no tool runs off-ledger.

read4 min views1 publishedJun 25, 2026

Agent observability has gotten good at answering what happened: OpenTelemetry spans for each model call and tool execution, structured event logs, replayable traces. If a run misbehaves, you can reconstruct the sequence.

But for anything that has to stand up to an incident review or a compliance ask, "what happened" isn't the question. The question is what was authorized:

Every one of those passes through a decision point in your agent runtime — a policy callback, a confirmation gate, a per-tool auth check. But traces describe execution; almost nothing writes down the authority. That's the gap a decision ledger fills.

Here's the part that took me a while to get right: a decision ledger that's just "more events" buys you nothing. To be auditable rather than merely verbose, it has to support a verifier that can prove ** executed == authorized** without trusting the agent's own narration. That decomposes into three layers, and each catches a failure the others can't.

Each decision and each outcome is a well-formed, canonicalized, hash-bound record. The load-bearing field is on the outcome: it must commit to the decision that authorized it.

decision_event = { decision_id, action_ref, principal, auth_mode,
                   policy_version, decision_state, args_digest, ts }

outcome_event  = { action_ref,
                   decision_digest = SHA256(JCS(decision_event)),
                   result_digest, terminal_state, ts }

action_ref

answers "are these two events about the same intended action?" — make it content-derived (e.g. SHA256(JCS({agent_id, action_type, scope, ts}))

) so any verifier can recompute it from the intent alone, with no shared runtime state.

decision_digest

answers a different question: "did this outcome commit to the exact decision that authorized it?" Keep the two separate — collapsing them loses your ability to catch a swapped outcome (a result re-attributed to the wrong decision).

Layer 1 can only reason about entries that exist. It cannot see an entry that was never written — and that's the highest-stakes failure for incident response, because a tool call that bypassed the policy path (or a crash between authority-grant and ledger-write) looks like silence, not a malformed row.

Close it by chaining: each entry carries prev_digest

pointing at the prior ledger head, and each turn/session close records the current ledger_head_digest

. Now the ledger is an append-only chain, and a dropped entry shows up as a broken chain — detectable without trusting the writer.

This catches two things Layer 1 can't:

allowed

, the handler then raises or times out, and no outcome is ever written. Indistinguishable from "allowed and silently succeeded" allowed

.⚠️

Concurrency gotcha.If your agent runs tool callsin parallel(most frameworks do), a naiveprev_digest

chainforks: two appends both chain to headH

, and a fork becomes indistinguishable from a drop. Two fixes —serialize the append(single-writer per session: a lock or a monotonic sequence, even while the tools themselves run concurrently), or model the ledger as an explicitDAGwhere each entry records a parentsetand the head is a Merkle root over the closed frontier. Pick one, and make sure the verifier knows which shape it's checking: a linear verifier mustrejectforks; a DAG verifier mustacceptshared parents.

The final layer ties the ledger back to the execution trace you already emit. Require a bijection at the action boundary:

every executed tool span maps to exactly one

allowed

decision and exactly one terminal outcome — and vice versa.

The trace proves execution happened; the ledger proves it was authorized; the bijection between them is the "no tool executes off-ledger" invariant. It's the omission detector that Layer 1's per-entry rules structurally cannot express, because it reasons across two independent systems.

Put together, the invariant a verifier can now assert is:

Nothing executed unauthorized, and nothing authorized vanished.

That's the actual compliance property — and you cannot get it from logging alone, no matter how thorough. Per-entry conformance proves each record is well-formed and bound; the chain proves the set is complete; the bijection proves the set matches reality.

The deeper principle is one I keep coming back to: a step that reasons can only ask you to trust it; a step that emits a re-checkable artifact — a content hash, a solver's optimality certificate, a recomputable digest — turns "we logged it" into "anyone can re-run it and get the same answer." Move the factual, state-changing parts of an agent through deterministic tools that leave certificates, and the audit stops being a leap of faith.

(That re-checkable-certificate idea is what I've been building into OraClaw — deterministic decision tools that return verifiable results — but the three-layer ledger above is framework-agnostic; it's worth wiring into whatever runtime you're on.)

If you're building agents that will ever face an auditor, the cheapest time to add the ledger is before you need it.

source & further reading

dev.to — original article AI Systems Need Evidence, Not Just Observability Top AI Papers on Hugging Face - 2026-06-25 The 7-Server Stack: How East Africa's Coordination Infrastructure Works Together

~/api · this article 200

$curl api.wpnews.pro/v1/news/traces-show-what-your-ag…

Read original on dev.to → dev.to/whatsonyourmind/traces-show-what-your-age…

mentioned entities

OpenTelemetry

metadata

slugtraces-show-what-your-agent-did-a-decision-ledger-shows-what-it-was-allowed-to

topic#ai-agents

secondary2 topics

sentimentneutral

canonicaldev.to

navigation

← prevAI Systems Need Evidence, Not Ju…

── more in #ai-agents 4 stories · sorted by recency

news.ycombinator.com · 25 Jun · #ai-agents

Safer coding agents for non-tech founders

dev.to · 25 Jun · #ai-agents

AI Systems Need Evidence, Not Just Observability

dev.to · 25 Jun · #ai-agents

The 7-Server Stack: How East Africa's Coordination Infrastructure Works Together

oreilly.com · 25 Jun · #ai-agents

The protocol is not the thing to get good at, the AX discipline is

── more on @opentelemetry 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 24 Jun · #ai-policy

An AI startup is suing the US government for taking away Anthropic's new model

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required