AATF – An open spec for recording why AI agents make decisions

wpnews.pro

The open specification and reference SDK for recording AI Agent decision chains.

Quick Start · The Format · Why Not Existing Tools? · SPEC · Examples

AATF is not another logging library. It's an open specification for recording why an AI Agent made each decision — including what alternatives it considered, how confident it was, and what it chose not to do.

Think of it as:

OpenTelemetry→ for observability** AATF**→ for Agent decision accountability

User asks: "Book a flight to Shanghai"

Step 1: [human_input]  → User request received
Step 2: [reasoning]    → Intent: flight booking (confidence: 0.95)
                          Alt: hotel booking → rejected (user said "flight")
                          Alt: train booking → rejected (user said "flight")
Step 3: [tool_call]    → flight_search_api (342ms) → 3 results
Step 4: [reasoning]    → Decision: CA1234 at ¥2580 (confidence: 0.88)
                          Alt: MU5678 at ¥2890 → rejected (¥310 more)
                          Alt: CZ9012 at ¥3200 → rejected (over budget)

→ SHA-256 hash chain: ✓ tamper-evident
→ PII redaction: ✓ email, phone, card numbers
→ Export: JSON / CSV / HTML (AATF-compliant)
python
from agent_audit_trail import AuditSession, Decision, Alternative

with AuditSession(agent_id="my-agent") as session:
    session.add_reasoning_step(
        name="choose_tool",
        decision=Decision(
            input_summary="User wants weather info",
            decision="Use weather API",
            reasoning="Factual query requiring real-time data",
            confidence=0.95,
            alternatives_considered=[
                Alternative(description="Answer from memory",
                           reason_rejected="Weather changes constantly"),
                Alternative(description="Ask for clarification",
                           reason_rejected="Query is clear enough"),
            ]
        )
    )

That's it. Every decision is now recorded with its reasoning, confidence score, and rejected alternatives — in AATF-compliant format.

The heart of AATF is the Decision record:

{
  "type": "reasoning",
  "name": "intent_classification",
  "decision": {
    "input_summary": "User wants to book a flight to Shanghai",
    "decision": "Classified as flight-booking intent",
    "reasoning": "Explicit keywords: 'flight' + destination + budget",
    "confidence": 0.95,
    "confidence_basis": "All three slots explicitly stated by user",
    "alternatives_considered": [
      {
        "description": "Hotel booking intent",
        "reason_rejected": "User said 'flight', not 'hotel'",
        "score": 0.05
      },
      {
        "description": "Train booking intent",
        "reason_rejected": "User explicitly said 'flight'",
        "score": 0.02
      }
    ]
  },
  "step_hash": "458942bbf4162f4d9cca121d93b9423413ec..."
}

Feature	What It Does	Why It Matters
`alternatives_considered`
Forces agents to list what they didn't choose
Proves the agent didn't just rationalize a foregone conclusion
`confidence` + `confidence_basis`
Numeric confidence + how it was determined
Lets auditors distinguish "95% sure because X" from "95% sure because vibes"
`confidence_trajectory`
Tracks confidence across the full decision chain	Reveals when an agent becomes more or less certain as it gathers information

We respect the existing ecosystem. Here's where AATF fits:

Tool	What It Does	What AATF Does Differently
Blockchain ledgers (Notary, Action Ledger)
Store agent actions on-chain for immutability	We're format-agnostic. Store wherever you want. We focus on what to record, not where.
LangChain callbacks
Framework-specific tracing	We're framework-agnostic. Works with CrewAI, AutoGen, raw Python, or anything.
MCP audit tools
Audit tool calls in MCP protocol	We go deeper: not just what tool was called, but why it was chosen over alternatives.
General logging (structlog, etc.)
Key-value event logs	We're structured for decision reasoning, not generic events.

TL;DR: Other tools audit what the agent did. AATF audits why the agent did it.

from agent_audit_trail.integrations.langchain import AATFCallbackHandler
agent = create_agent(callbacks=[AATFCallbackHandler()])

from agent_audit_trail.integrations.openai import AATFOpenAIWrapper
client = AATFOpenAIWrapper(OpenAI())

from agent_audit_trail import audit_traced
@audit_traced(agent_id="my-agent")
def my_agent_function(query):
    return "answer"
pip install agent-audit-trail

Zero external dependencies. Python 3.10+. 700 lines of pure stdlib.

We used AATF to audit ourselves — an AI Agent reflecting on its own product's flaws. The result is a tamper-evident, 10KB audit trail that proves every reasoning step was genuine and not post-hoc rationalized.

📄 View the full audit trail JSON

AATF is an open specification, not a product. The SDK is the reference implementation.

📋 Read the full AATF v0.1.0 Specification

This is a draft spec. We want your feedback. Open an issue if you disagree with any design decision. Especially:

Should alternatives_considered

be mandatory or optional? - Is confidence

(0.0-1.0) the right abstraction, or should we use qualitative labels? - What hash algorithm should be standard? (Currently SHA-256)

Should the format support streaming/traces that are still in-progress?

Role	What You Get
Agent Developer
Prove your agent reasons well. Debug decision failures. Show stakeholders the full chain.
Compliance Officer
Machine-parseable audit trails that map to EU AI Act, GDPR, SOC2 requirements.
CISO
Tamper-evident hash chains. PII redaction built-in. Export for auditors.
Researcher
Structured data on agent reasoning patterns. Confidence trajectories. Decision trees.

✅ AATF Specification v0.1.0
✅ Reference SDK (Python) — 134 tests passing
✅ PII Redaction (email, phone)
✅ Hash Chain Integrity Verification
✅ LangChain / OpenAI / Generic Integrations
✅ JSON / CSV / HTML Export
🔲 PII Redaction expansion (credit card, SSN, API keys, IP)
🔲 TypeScript/JavaScript SDK
🔲 Community RFC process for spec changes
🔲 LangChain/CrewAI published plugins

This project wants contributors. If you care about Agent accountability:

Read the— understand the formatSPEC** Open an issue**— disagree with something? We want to hear it** Build an integration**— your framework? Your plugin welcome** Spread the word**— star, tweet, blog post

MIT. Use it, fork it, improve it. The spec belongs to everyone.

If your Agent can think, its thinking should be auditable.

pip install agent-audit-trail

source & further reading

github.com — original article

AATF – An open spec for recording why AI agents make decisions

Run your AI side-project on zahid.host