{"slug": "aatf-an-open-spec-for-recording-why-ai-agents-make-decisions", "title": "AATF – An open spec for recording why AI agents make decisions", "summary": "The Agent Audit Trail Format (AATF) is an open specification for recording why AI agents make decisions, including alternatives considered, confidence scores, and rejected options. It provides a structured, tamper-evident format for accountability, distinct from logging or tracing tools. The project includes a reference SDK and aims to improve transparency in AI agent decision-making.", "body_md": "**The open specification and reference SDK for recording AI Agent decision chains.**\n\n[Quick Start](#quick-start-5-lines) · [The Format](#the-aatf-format) · [Why Not Existing Tools?](#why-not-existing-tools) · [SPEC](/wdh107/agent-audit-trail/blob/main/SPEC.md) · [Examples](/wdh107/agent-audit-trail/blob/main/examples)\n\nAATF is **not** another logging library. It's an **open specification** for recording *why* an AI Agent made each decision — including what alternatives it considered, how confident it was, and what it chose not to do.\n\nThink of it as:\n\n**OpenTelemetry**→ for observability** AATF**→ for Agent decision accountability\n\n```\nUser asks: \"Book a flight to Shanghai\"\n\nStep 1: [human_input]  → User request received\nStep 2: [reasoning]    → Intent: flight booking (confidence: 0.95)\n                          Alt: hotel booking → rejected (user said \"flight\")\n                          Alt: train booking → rejected (user said \"flight\")\nStep 3: [tool_call]    → flight_search_api (342ms) → 3 results\nStep 4: [reasoning]    → Decision: CA1234 at ¥2580 (confidence: 0.88)\n                          Alt: MU5678 at ¥2890 → rejected (¥310 more)\n                          Alt: CZ9012 at ¥3200 → rejected (over budget)\n\n→ SHA-256 hash chain: ✓ tamper-evident\n→ PII redaction: ✓ email, phone, card numbers\n→ Export: JSON / CSV / HTML (AATF-compliant)\npython\nfrom agent_audit_trail import AuditSession, Decision, Alternative\n\nwith AuditSession(agent_id=\"my-agent\") as session:\n    session.add_reasoning_step(\n        name=\"choose_tool\",\n        decision=Decision(\n            input_summary=\"User wants weather info\",\n            decision=\"Use weather API\",\n            reasoning=\"Factual query requiring real-time data\",\n            confidence=0.95,\n            alternatives_considered=[\n                Alternative(description=\"Answer from memory\",\n                           reason_rejected=\"Weather changes constantly\"),\n                Alternative(description=\"Ask for clarification\",\n                           reason_rejected=\"Query is clear enough\"),\n            ]\n        )\n    )\n```\n\nThat's it. Every decision is now recorded with its reasoning, confidence score, and rejected alternatives — in AATF-compliant format.\n\nThe heart of AATF is the **Decision record**:\n\n```\n{\n  \"type\": \"reasoning\",\n  \"name\": \"intent_classification\",\n  \"decision\": {\n    \"input_summary\": \"User wants to book a flight to Shanghai\",\n    \"decision\": \"Classified as flight-booking intent\",\n    \"reasoning\": \"Explicit keywords: 'flight' + destination + budget\",\n    \"confidence\": 0.95,\n    \"confidence_basis\": \"All three slots explicitly stated by user\",\n    \"alternatives_considered\": [\n      {\n        \"description\": \"Hotel booking intent\",\n        \"reason_rejected\": \"User said 'flight', not 'hotel'\",\n        \"score\": 0.05\n      },\n      {\n        \"description\": \"Train booking intent\",\n        \"reason_rejected\": \"User explicitly said 'flight'\",\n        \"score\": 0.02\n      }\n    ]\n  },\n  \"step_hash\": \"458942bbf4162f4d9cca121d93b9423413ec...\"\n}\n```\n\n| Feature | What It Does | Why It Matters |\n|---|---|---|\n`alternatives_considered` |\nForces agents to list what they didn't choose |\nProves the agent didn't just rationalize a foregone conclusion |\n`confidence` + `confidence_basis` |\nNumeric confidence + how it was determined |\nLets auditors distinguish \"95% sure because X\" from \"95% sure because vibes\" |\n`confidence_trajectory` |\nTracks confidence across the full decision chain | Reveals when an agent becomes more or less certain as it gathers information |\n\nWe respect the existing ecosystem. Here's where AATF fits:\n\n| Tool | What It Does | What AATF Does Differently |\n|---|---|---|\nBlockchain ledgers (Notary, Action Ledger) |\nStore agent actions on-chain for immutability | We're format-agnostic. Store wherever you want. We focus on what to record, not where. |\nLangChain callbacks |\nFramework-specific tracing | We're framework-agnostic. Works with CrewAI, AutoGen, raw Python, or anything. |\nMCP audit tools |\nAudit tool calls in MCP protocol | We go deeper: not just what tool was called, but why it was chosen over alternatives. |\nGeneral logging (structlog, etc.) |\nKey-value event logs | We're structured for decision reasoning, not generic events. |\n\n**TL;DR:** Other tools audit *what the agent did*. AATF audits *why the agent did it*.\n\n``` python\n# LangChain\nfrom agent_audit_trail.integrations.langchain import AATFCallbackHandler\nagent = create_agent(callbacks=[AATFCallbackHandler()])\n\n# OpenAI\nfrom agent_audit_trail.integrations.openai import AATFOpenAIWrapper\nclient = AATFOpenAIWrapper(OpenAI())\n\n# Generic decorator (any framework)\nfrom agent_audit_trail import audit_traced\n@audit_traced(agent_id=\"my-agent\")\ndef my_agent_function(query):\n    return \"answer\"\npip install agent-audit-trail\n```\n\nZero external dependencies. Python 3.10+. 700 lines of pure stdlib.\n\nWe used AATF to audit *ourselves* — an AI Agent reflecting on its own product's flaws. The result is a tamper-evident, 10KB audit trail that proves every reasoning step was genuine and not post-hoc rationalized.\n\n📄 [View the full audit trail JSON](/wdh107/agent-audit-trail/blob/main/docs/self_audit_example.json)\n\nAATF is an open specification, not a product. The SDK is the reference implementation.\n\n📋 [Read the full AATF v0.1.0 Specification](/wdh107/agent-audit-trail/blob/main/SPEC.md)\n\n**This is a draft spec. We want your feedback.** Open an issue if you disagree with any design decision. Especially:\n\n- Should\n`alternatives_considered`\n\nbe mandatory or optional? - Is\n`confidence`\n\n(0.0-1.0) the right abstraction, or should we use qualitative labels? - What hash algorithm should be standard? (Currently SHA-256)\n- Should the format support streaming/traces that are still in-progress?\n\n| Role | What You Get |\n|---|---|\nAgent Developer |\nProve your agent reasons well. Debug decision failures. Show stakeholders the full chain. |\nCompliance Officer |\nMachine-parseable audit trails that map to EU AI Act, GDPR, SOC2 requirements. |\nCISO |\nTamper-evident hash chains. PII redaction built-in. Export for auditors. |\nResearcher |\nStructured data on agent reasoning patterns. Confidence trajectories. Decision trees. |\n\n- ✅ AATF Specification v0.1.0\n- ✅ Reference SDK (Python) — 134 tests passing\n- ✅ PII Redaction (email, phone)\n- ✅ Hash Chain Integrity Verification\n- ✅ LangChain / OpenAI / Generic Integrations\n- ✅ JSON / CSV / HTML Export\n- 🔲 PII Redaction expansion (credit card, SSN, API keys, IP)\n- 🔲 TypeScript/JavaScript SDK\n- 🔲 Community RFC process for spec changes\n- 🔲 LangChain/CrewAI published plugins\n\nThis project wants contributors. If you care about Agent accountability:\n\n**Read the**— understand the format[SPEC](/wdh107/agent-audit-trail/blob/main/SPEC.md)** Open an issue**— disagree with something? We want to hear it** Build an integration**— your framework? Your plugin welcome** Spread the word**— star, tweet, blog post\n\nMIT. Use it, fork it, improve it. The spec belongs to everyone.\n\n**If your Agent can think, its thinking should be auditable.**\n\n`pip install agent-audit-trail`", "url": "https://wpnews.pro/news/aatf-an-open-spec-for-recording-why-ai-agents-make-decisions", "canonical_source": "https://github.com/wdh107/agent-audit-trail", "published_at": "2026-06-16 04:10:27+00:00", "updated_at": "2026-06-16 04:21:53.232334+00:00", "lang": "en", "topics": ["ai-agents", "ai-safety", "ai-ethics", "developer-tools", "ai-infrastructure"], "entities": ["AATF", "OpenTelemetry", "LangChain", "CrewAI", "AutoGen"], "alternates": {"html": "https://wpnews.pro/news/aatf-an-open-spec-for-recording-why-ai-agents-make-decisions", "markdown": "https://wpnews.pro/news/aatf-an-open-spec-for-recording-why-ai-agents-make-decisions.md", "text": "https://wpnews.pro/news/aatf-an-open-spec-for-recording-why-ai-agents-make-decisions.txt", "jsonld": "https://wpnews.pro/news/aatf-an-open-spec-for-recording-why-ai-agents-make-decisions.jsonld"}}