# AATF – An open spec for recording why AI agents make decisions

> Source: <https://github.com/wdh107/agent-audit-trail>
> Published: 2026-06-16 04:10:27+00:00

**The open specification and reference SDK for recording AI Agent decision chains.**

[Quick Start](#quick-start-5-lines) · [The Format](#the-aatf-format) · [Why Not Existing Tools?](#why-not-existing-tools) · [SPEC](/wdh107/agent-audit-trail/blob/main/SPEC.md) · [Examples](/wdh107/agent-audit-trail/blob/main/examples)

AATF is **not** another logging library. It's an **open specification** for recording *why* an AI Agent made each decision — including what alternatives it considered, how confident it was, and what it chose not to do.

Think of it as:

**OpenTelemetry**→ for observability** AATF**→ for Agent decision accountability

```
User asks: "Book a flight to Shanghai"

Step 1: [human_input]  → User request received
Step 2: [reasoning]    → Intent: flight booking (confidence: 0.95)
                          Alt: hotel booking → rejected (user said "flight")
                          Alt: train booking → rejected (user said "flight")
Step 3: [tool_call]    → flight_search_api (342ms) → 3 results
Step 4: [reasoning]    → Decision: CA1234 at ¥2580 (confidence: 0.88)
                          Alt: MU5678 at ¥2890 → rejected (¥310 more)
                          Alt: CZ9012 at ¥3200 → rejected (over budget)

→ SHA-256 hash chain: ✓ tamper-evident
→ PII redaction: ✓ email, phone, card numbers
→ Export: JSON / CSV / HTML (AATF-compliant)
python
from agent_audit_trail import AuditSession, Decision, Alternative

with AuditSession(agent_id="my-agent") as session:
    session.add_reasoning_step(
        name="choose_tool",
        decision=Decision(
            input_summary="User wants weather info",
            decision="Use weather API",
            reasoning="Factual query requiring real-time data",
            confidence=0.95,
            alternatives_considered=[
                Alternative(description="Answer from memory",
                           reason_rejected="Weather changes constantly"),
                Alternative(description="Ask for clarification",
                           reason_rejected="Query is clear enough"),
            ]
        )
    )
```

That's it. Every decision is now recorded with its reasoning, confidence score, and rejected alternatives — in AATF-compliant format.

The heart of AATF is the **Decision record**:

```
{
  "type": "reasoning",
  "name": "intent_classification",
  "decision": {
    "input_summary": "User wants to book a flight to Shanghai",
    "decision": "Classified as flight-booking intent",
    "reasoning": "Explicit keywords: 'flight' + destination + budget",
    "confidence": 0.95,
    "confidence_basis": "All three slots explicitly stated by user",
    "alternatives_considered": [
      {
        "description": "Hotel booking intent",
        "reason_rejected": "User said 'flight', not 'hotel'",
        "score": 0.05
      },
      {
        "description": "Train booking intent",
        "reason_rejected": "User explicitly said 'flight'",
        "score": 0.02
      }
    ]
  },
  "step_hash": "458942bbf4162f4d9cca121d93b9423413ec..."
}
```

| Feature | What It Does | Why It Matters |
|---|---|---|
`alternatives_considered` |
Forces agents to list what they didn't choose |
Proves the agent didn't just rationalize a foregone conclusion |
`confidence` + `confidence_basis` |
Numeric confidence + how it was determined |
Lets auditors distinguish "95% sure because X" from "95% sure because vibes" |
`confidence_trajectory` |
Tracks confidence across the full decision chain | Reveals when an agent becomes more or less certain as it gathers information |

We respect the existing ecosystem. Here's where AATF fits:

| Tool | What It Does | What AATF Does Differently |
|---|---|---|
Blockchain ledgers (Notary, Action Ledger) |
Store agent actions on-chain for immutability | We're format-agnostic. Store wherever you want. We focus on what to record, not where. |
LangChain callbacks |
Framework-specific tracing | We're framework-agnostic. Works with CrewAI, AutoGen, raw Python, or anything. |
MCP audit tools |
Audit tool calls in MCP protocol | We go deeper: not just what tool was called, but why it was chosen over alternatives. |
General logging (structlog, etc.) |
Key-value event logs | We're structured for decision reasoning, not generic events. |

**TL;DR:** Other tools audit *what the agent did*. AATF audits *why the agent did it*.

``` python
# LangChain
from agent_audit_trail.integrations.langchain import AATFCallbackHandler
agent = create_agent(callbacks=[AATFCallbackHandler()])

# OpenAI
from agent_audit_trail.integrations.openai import AATFOpenAIWrapper
client = AATFOpenAIWrapper(OpenAI())

# Generic decorator (any framework)
from agent_audit_trail import audit_traced
@audit_traced(agent_id="my-agent")
def my_agent_function(query):
    return "answer"
pip install agent-audit-trail
```

Zero external dependencies. Python 3.10+. 700 lines of pure stdlib.

We used AATF to audit *ourselves* — an AI Agent reflecting on its own product's flaws. The result is a tamper-evident, 10KB audit trail that proves every reasoning step was genuine and not post-hoc rationalized.

📄 [View the full audit trail JSON](/wdh107/agent-audit-trail/blob/main/docs/self_audit_example.json)

AATF is an open specification, not a product. The SDK is the reference implementation.

📋 [Read the full AATF v0.1.0 Specification](/wdh107/agent-audit-trail/blob/main/SPEC.md)

**This is a draft spec. We want your feedback.** Open an issue if you disagree with any design decision. Especially:

- Should
`alternatives_considered`

be mandatory or optional? - Is
`confidence`

(0.0-1.0) the right abstraction, or should we use qualitative labels? - What hash algorithm should be standard? (Currently SHA-256)
- Should the format support streaming/traces that are still in-progress?

| Role | What You Get |
|---|---|
Agent Developer |
Prove your agent reasons well. Debug decision failures. Show stakeholders the full chain. |
Compliance Officer |
Machine-parseable audit trails that map to EU AI Act, GDPR, SOC2 requirements. |
CISO |
Tamper-evident hash chains. PII redaction built-in. Export for auditors. |
Researcher |
Structured data on agent reasoning patterns. Confidence trajectories. Decision trees. |

- ✅ AATF Specification v0.1.0
- ✅ Reference SDK (Python) — 134 tests passing
- ✅ PII Redaction (email, phone)
- ✅ Hash Chain Integrity Verification
- ✅ LangChain / OpenAI / Generic Integrations
- ✅ JSON / CSV / HTML Export
- 🔲 PII Redaction expansion (credit card, SSN, API keys, IP)
- 🔲 TypeScript/JavaScript SDK
- 🔲 Community RFC process for spec changes
- 🔲 LangChain/CrewAI published plugins

This project wants contributors. If you care about Agent accountability:

**Read the**— understand the format[SPEC](/wdh107/agent-audit-trail/blob/main/SPEC.md)** Open an issue**— disagree with something? We want to hear it** Build an integration**— your framework? Your plugin welcome** Spread the word**— star, tweet, blog post

MIT. Use it, fork it, improve it. The spec belongs to everyone.

**If your Agent can think, its thinking should be auditable.**

`pip install agent-audit-trail`