{"slug": "agents-need-receipts-not-just-better-prompts", "title": "Agents Need Receipts, Not Just Better Prompts", "summary": "The article argues that AI agents should produce structured \"receipts\" rather than just conversational outputs or raw logs, as these receipts provide essential evidence for trust, debugging, and accountability. It outlines specific fields a receipt should contain, such as task scope, actions taken, checks run, and outcome, and emphasizes that this operational record should come from the runtime or control plane rather than the agent's own summary. The piece introduces Armorer, a local control plane designed to make agent runs, tools, approvals, and recovery inspectable on a user's own machine.", "body_md": "Most AI agent demos optimize for the first successful run.\n\nReal agent work gets interesting after the agent says \"done.\"\n\nFor a coding agent, browser agent, or MCP-connected workflow, the final chat answer is not enough. I want a receipt: a compact operational record that helps a human trust, debug, replay, roll back, or explain what happened.\n\nNot a giant transcript. Not a raw log dump. A receipt.\n\n## \"Done\" is not a state\n\nImagine an agent is asked to update a billing flow.\n\nIt reads docs, edits four files, calls a test command, skips one integration test, touches an env file, and says:\n\nDone.\n\nThat answer is almost useless by itself.\n\nThe operator still needs to know:\n\n- What task did the agent think it was doing?\n- What files, tools, systems, or data was it allowed to touch?\n- What context influenced the work?\n- Which tools or commands did it call?\n- Which actions were read-only versus write, destructive, external, or spend-affecting?\n- What changed?\n- Which checks passed, failed, or were skipped?\n- What required approval?\n- What should a human review?\n- How do I retry, replay, resume, or roll back?\n\nThat is the receipt.\n\n## What should be in an agent receipt?\n\nThe first version does not need to be fancy.\n\nA useful receipt should include:\n\n-\n`task`\n\n: what the agent believed it was doing -\n`scope`\n\n: files, systems, tools, or data it was allowed to touch -\n`context_used`\n\n: docs, files, memories, links, or prior runs that influenced the work -\n`actions`\n\n: tool calls, commands, API calls, file edits -\n`action_class`\n\n: read, write, destructive, external send, spend-affecting, permission-changing -\n`state_changes`\n\n: files changed, records created, messages sent, jobs started -\n`checks_run`\n\n: tests, linters, scans, dry runs, evals -\n`checks_skipped`\n\n: expected checks that were not run, with reason -\n`approvals`\n\n: who or what approved the action, scope, expiry, one-off versus policy -\n`outcome`\n\n: completed, partial, blocked, failed, reverted, needs review -\n`recovery`\n\n: how to retry, resume, inspect, or roll back\n\nHere is a small example:\n\n```\n{\n  \"receipt_version\": \"0.1\",\n  \"run_id\": \"run_2026_05_23_001\",\n  \"agent\": {\n    \"name\": \"local-coding-agent\",\n    \"provider\": \"anthropic\",\n    \"model\": \"claude-sonnet-4.5\",\n    \"runtime\": \"local\"\n  },\n  \"task\": {\n    \"summary\": \"Update the billing retry handler and add regression coverage\",\n    \"scope\": [\n      \"repo:apps/billing\",\n      \"tool:filesystem.read\",\n      \"tool:filesystem.write\",\n      \"tool:shell.test\"\n    ],\n    \"out_of_scope\": [\n      \"production database\",\n      \"deployment\",\n      \"customer email sending\"\n    ]\n  },\n  \"actions\": [\n    {\n      \"tool\": \"filesystem.write\",\n      \"action_class\": \"write\",\n      \"result\": \"success\",\n      \"decision_id\": \"decision_write_002\"\n    },\n    {\n      \"tool\": \"shell.test\",\n      \"action_class\": \"exec\",\n      \"result\": \"success\",\n      \"decision_id\": \"decision_exec_004\"\n    }\n  ],\n  \"checks\": {\n    \"run\": [\"npm test -- billing\"],\n    \"skipped\": [\n      {\n        \"check\": \"full integration suite\",\n        \"reason\": \"requires staging credentials\"\n      }\n    ]\n  },\n  \"outcome\": {\n    \"status\": \"completed\",\n    \"review_needed\": true,\n    \"recovery\": \"Revert the modified files or rerun npm test -- billing\"\n  }\n}\n```\n\n## The model should not own the receipt\n\nThe model can summarize intent.\n\nBut the hard evidence should come from the runtime, tool layer, or control plane:\n\n- commands\n- exit codes\n- tool calls\n- files touched\n- approvals\n- policy versions\n- state changes\n- artifacts created\n\nIf the agent writes its own audit trail, the audit trail is just another model output.\n\nThat is useful as a summary, but it is not enough as evidence.\n\n## Traces are not enough\n\nOpenTelemetry-style traces are useful. They explain latency, retries, errors, and service boundaries.\n\nBut an agent operator often needs a different object.\n\nA trace tells you which span was slow.\n\nA receipt tells you what the agent was allowed to do, what it actually did, why it was allowed, what changed, and what should be reviewed.\n\nTraces explain execution.\n\nReceipts explain responsibility.\n\nYou need both.\n\n## MCP makes receipts more important\n\nMCP is useful because it gives agents a common way to access tools and context.\n\nIt also makes the tool boundary much more important.\n\nOnce an agent can call multiple MCP servers, a single call can look harmless while the sequence is not:\n\n- Read customer data from server A.\n- Process it through server B.\n- Publish or send it through server C.\n\nThat is why receipts should capture not only individual calls, but also source, sink, data class, action class, policy version, and approval scope across the run.\n\n## Where we are taking this with Armorer\n\nThis is the direction we are building toward with Armorer.\n\nArmorer is a local control plane for AI agents. The goal is to make agent runs, tools, approvals, jobs, logs, and recovery inspectable on your own machine instead of treating every agent as an opaque chat window.\n\nArmorer Guard focuses on checks near the action boundary: what is the agent trying to do, what class of action is it, should it be allowed, blocked, or routed to approval, and what decision record should exist afterward?\n\nThe GitHub discussion for the receipt spec is here:\n\n[https://github.com/ArmorerLabs/Armorer/discussions/43](https://github.com/ArmorerLabs/Armorer/discussions/43)\n\nAnd the repo is here:\n\n[https://github.com/ArmorerLabs/Armorer](https://github.com/ArmorerLabs/Armorer)\n\nThe bet is simple:\n\nAs agents get more capable, the bottleneck moves from \"can it do the task?\" to \"can I understand, govern, and repair what it did?\"\n\nThat layer is still early.\n\nBut I think it is where practical agent engineering is heading.", "url": "https://wpnews.pro/news/agents-need-receipts-not-just-better-prompts", "canonical_source": "https://dev.to/armorer_labs/agents-need-receipts-not-just-better-prompts-1cg", "published_at": "2026-05-23 09:04:14+00:00", "updated_at": "2026-05-23 09:33:23.073159+00:00", "lang": "en", "topics": ["artificial-intelligence", "developer-tools", "enterprise-software"], "entities": ["MCP"], "alternates": {"html": "https://wpnews.pro/news/agents-need-receipts-not-just-better-prompts", "markdown": "https://wpnews.pro/news/agents-need-receipts-not-just-better-prompts.md", "text": "https://wpnews.pro/news/agents-need-receipts-not-just-better-prompts.txt", "jsonld": "https://wpnews.pro/news/agents-need-receipts-not-just-better-prompts.jsonld"}}