AATF – An open spec for recording why AI agents make decisions

The Agent Audit Trail Format (AATF) is an open specification for recording why AI agents make decisions, including alternatives considered, confidence scores, and rejected options. It provides a structured, tamper-evident format for accountability, distinct from logging or tracing tools. The project includes a reference SDK and aims to improve transparency in AI agent decision-making.

The open specification and reference SDK for recording AI Agent decision chains. Quick Start quick-start-5-lines · The Format the-aatf-format · Why Not Existing Tools? why-not-existing-tools · SPEC /wdh107/agent-audit-trail/blob/main/SPEC.md · Examples /wdh107/agent-audit-trail/blob/main/examples AATF is not another logging library. It's an open specification for recording why an AI Agent made each decision — including what alternatives it considered, how confident it was, and what it chose not to do. Think of it as: OpenTelemetry → for observability AATF → for Agent decision accountability User asks: "Book a flight to Shanghai" Step 1: human input → User request received Step 2: reasoning → Intent: flight booking confidence: 0.95 Alt: hotel booking → rejected user said "flight" Alt: train booking → rejected user said "flight" Step 3: tool call → flight search api 342ms → 3 results Step 4: reasoning → Decision: CA1234 at ¥2580 confidence: 0.88 Alt: MU5678 at ¥2890 → rejected ¥310 more Alt: CZ9012 at ¥3200 → rejected over budget → SHA-256 hash chain: ✓ tamper-evident → PII redaction: ✓ email, phone, card numbers → Export: JSON / CSV / HTML AATF-compliant python from agent audit trail import AuditSession, Decision, Alternative with AuditSession agent id="my-agent" as session: session.add reasoning step name="choose tool", decision=Decision input summary="User wants weather info", decision="Use weather API", reasoning="Factual query requiring real-time data", confidence=0.95, alternatives considered= Alternative description="Answer from memory", reason rejected="Weather changes constantly" , Alternative description="Ask for clarification", reason rejected="Query is clear enough" , That's it. Every decision is now recorded with its reasoning, confidence score, and rejected alternatives — in AATF-compliant format. The heart of AATF is the Decision record : { "type": "reasoning", "name": "intent classification", "decision": { "input summary": "User wants to book a flight to Shanghai", "decision": "Classified as flight-booking intent", "reasoning": "Explicit keywords: 'flight' + destination + budget", "confidence": 0.95, "confidence basis": "All three slots explicitly stated by user", "alternatives considered": { "description": "Hotel booking intent", "reason rejected": "User said 'flight', not 'hotel'", "score": 0.05 }, { "description": "Train booking intent", "reason rejected": "User explicitly said 'flight'", "score": 0.02 } }, "step hash": "458942bbf4162f4d9cca121d93b9423413ec..." } | Feature | What It Does | Why It Matters | |---|---|---| alternatives considered | Forces agents to list what they didn't choose | Proves the agent didn't just rationalize a foregone conclusion | confidence + confidence basis | Numeric confidence + how it was determined | Lets auditors distinguish "95% sure because X" from "95% sure because vibes" | confidence trajectory | Tracks confidence across the full decision chain | Reveals when an agent becomes more or less certain as it gathers information | We respect the existing ecosystem. Here's where AATF fits: | Tool | What It Does | What AATF Does Differently | |---|---|---| Blockchain ledgers Notary, Action Ledger | Store agent actions on-chain for immutability | We're format-agnostic. Store wherever you want. We focus on what to record, not where. | LangChain callbacks | Framework-specific tracing | We're framework-agnostic. Works with CrewAI, AutoGen, raw Python, or anything. | MCP audit tools | Audit tool calls in MCP protocol | We go deeper: not just what tool was called, but why it was chosen over alternatives. | General logging structlog, etc. | Key-value event logs | We're structured for decision reasoning, not generic events. | TL;DR: Other tools audit what the agent did . AATF audits why the agent did it . python LangChain from agent audit trail.integrations.langchain import AATFCallbackHandler agent = create agent callbacks= AATFCallbackHandler OpenAI from agent audit trail.integrations.openai import AATFOpenAIWrapper client = AATFOpenAIWrapper OpenAI Generic decorator any framework from agent audit trail import audit traced @audit traced agent id="my-agent" def my agent function query : return "answer" pip install agent-audit-trail Zero external dependencies. Python 3.10+. 700 lines of pure stdlib. We used AATF to audit ourselves — an AI Agent reflecting on its own product's flaws. The result is a tamper-evident, 10KB audit trail that proves every reasoning step was genuine and not post-hoc rationalized. 📄 View the full audit trail JSON /wdh107/agent-audit-trail/blob/main/docs/self audit example.json AATF is an open specification, not a product. The SDK is the reference implementation. 📋 Read the full AATF v0.1.0 Specification /wdh107/agent-audit-trail/blob/main/SPEC.md This is a draft spec. We want your feedback. Open an issue if you disagree with any design decision. Especially: - Should alternatives considered be mandatory or optional? - Is confidence 0.0-1.0 the right abstraction, or should we use qualitative labels? - What hash algorithm should be standard? Currently SHA-256 - Should the format support streaming/traces that are still in-progress? | Role | What You Get | |---|---| Agent Developer | Prove your agent reasons well. Debug decision failures. Show stakeholders the full chain. | Compliance Officer | Machine-parseable audit trails that map to EU AI Act, GDPR, SOC2 requirements. | CISO | Tamper-evident hash chains. PII redaction built-in. Export for auditors. | Researcher | Structured data on agent reasoning patterns. Confidence trajectories. Decision trees. | - ✅ AATF Specification v0.1.0 - ✅ Reference SDK Python — 134 tests passing - ✅ PII Redaction email, phone - ✅ Hash Chain Integrity Verification - ✅ LangChain / OpenAI / Generic Integrations - ✅ JSON / CSV / HTML Export - 🔲 PII Redaction expansion credit card, SSN, API keys, IP - 🔲 TypeScript/JavaScript SDK - 🔲 Community RFC process for spec changes - 🔲 LangChain/CrewAI published plugins This project wants contributors. If you care about Agent accountability: Read the — understand the format SPEC /wdh107/agent-audit-trail/blob/main/SPEC.md Open an issue — disagree with something? We want to hear it Build an integration — your framework? Your plugin welcome Spread the word — star, tweet, blog post MIT. Use it, fork it, improve it. The spec belongs to everyone. If your Agent can think, its thinking should be auditable. pip install agent-audit-trail