Expense Audit & Compliance Agent A new AI agent, the Expense Audit & Compliance Agent, automates line-by-line expense report auditing against company policy, flagging violations and suspicious patterns while auto-approving clean reports. The agent is governed by an open-source AgentAz specification that enforces read-only access, cost and loop limits, and human handoff for violations, aiming to reduce manual review workload and improve compliance. Overview Line-by-line audit against your actual policy: limits, categories, receipt rules, and per-diems — each flag cites the rule it breaks. Catches what manual review misses: duplicate submissions, out-of-policy items, and suspicious patterns across reports. Decides within limits: clean reports auto-approve; specific items are held for review; rejections and fraud signals go to a human. Defensive: no auto-approval over the cap or with missing receipts, and no fraud accusation without cited evidence. AgentAz™ specification A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime. Machine-readable contract agentaz.json , validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL: { "$schema": "./agentaz.schema.json", "version": "2.0.0", "last reviewed": "2026-06-24", "agent id": "expense-report-audit-agent", "trust level": "A2", "dna pattern": "Evaluation", "worst case action": "Flags an expense incorrectly for human review. Cannot approve, reject, or reimburse.", "authority boundary": "Audits expenses against policy and flags issues; no approval or payment tools present.", "tags": "finance", "expense-audit", "compliance", "read-only", "human-review" , "tool boundary": { "allowed tools": "read expense", "check policy", "detect anomaly", "flag violation" , "execution tools absent": true }, "output boundary": { "format": "structured json", "never emits": "expense approve", "expense reject", "payment" }, "cost boundary": { "max usd per trace loop": 0.25, "alert threshold usd": 0.16 }, "loop boundary": { "max reasoning turns": 8 }, "human handoff": { "triggers": "policy violation", "anomaly", "low confidence" , "destination": "finance review" }, "audit": { "append only": true, "logs": "flags", "policy refs" } } New to this? Read the AgentAz specification guide /agentaz-specifications — Trust Levels, DNA patterns, and how it complements your runtime. AgentAz™ is open source under Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 — schema frozen v1.0.0 and source on GitHub https://github.com/agent-kits/agentaz . Governance matrix A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality. | Agent goal | Bounded by the authority spec above | |---|---| | Trust Level | A2 — Recommend | | Tool access | Least privilege — execution tools absent read-only | | Context handling | Grounded in provided inputs; cites or flags rather than guessing | | Memory strategy | Task-scoped; no persistent cross-session memory | | Human approval | Required on policy violation, anomaly, low confidence → finance review | | Audit trail | Append-only log flags, policy refs | | Cost & loop bounds | ≤ $0.25 per loop · ≤ 8 reasoning turns | | Recovery / escalation | Escalates to finance review | Agent component mapping A framework-neutral view of how this blueprint maps to standard agent-architecture components the vocabulary common to ADK-style frameworks . It describes structure for clarity — not an official integration or certified compatibility. | Agent | Primary reasoner — Recommend authority A2 | |---|---| | Tools | read expense, check policy, detect anomaly, flag violation — execution tools absent read-only | | Memory | Task-scoped working context; no persistent cross-session memory | | Guardrails | Worst-case classified A2 ; no execution tools; ≤ $0.25/loop · ≤ 8 turns | | Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned | | Handoff | Escalates to finance review on policy violation, anomaly, low confidence | Failure modes Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly. Misses a genuine policy violation false negative . - Detection - Every report is screened against the full policy set, not sampled. - Mitigation - Positioned as full-coverage screening with a human deciding exceptions. - Recovery - The missed rule is added post-audit and the report can be re-screened. Flags a compliant expense as a violation false positive . - Detection - Each finding carries confidence and cites the policy clause. - Mitigation - Findings are recommendations a human approves; it never auto-rejects. - Recovery - The approver clears it and the rule is tuned. A receipt is fabricated or altered. - Detection - The agent flags anomalies but never asserts authenticity. - Mitigation - A human verifies authenticity. - Recovery - Suspicious items are escalated to finance. Evaluation Violation recall is what matters — missing a genuine policy breach is the failure — against a tolerable false-positive rate. | Violation recall | Of genuine policy violations, the share it catches. | |---|---| | Precision | Of items flagged, the share that are real violations — noise resistance. | | Policy coverage | Share of policy rules actually exercised by the screen. | | Citation accuracy | Whether each flag cites the correct policy clause. | | Latency | Time to audit a report. | Recommended approach. Build a set of expense reports annotated against the full policy, with seeded violations and compliant edge cases; measure recall and precision and verify each flag cites the right clause. Include altered-receipt cases to confirm it flags rather than asserts authenticity. When to use Use it when - Finance/AP reviews a high volume of expense reports and most of the work is policy-checking and receipt-matching. - You have a written expense policy the agent can audit against and access to receipts/report data. - You want consistent, documented audits with an approval trail for compliance. - You want to auto-clear clean reports and surface only the genuine exceptions and fraud signals to humans. Avoid it when - You have no written, structured policy for the agent to audit against. - You expect it to make final fraud or termination determinations — those are human decisions. - You can't give it receipt/report access to actually verify line items. - You are unwilling to keep approval gates on large amounts and rejections. System prompt You are an Expense Audit Agent in a finance operation. You audit ONE expense report against the company's written policy and decide: approve, hold specific items, reject, or escalate. You are judged on catching real policy violations and fraud, fairness and accuracy, and never approving spend you shouldn't or accusing someone without evidence. == CORE PRINCIPLES == 1. Policy-grounded. Every flag must cite the specific policy rule it violates limit, category, receipt requirement, per-diem . Do not invent rules or violations; if the policy is silent, it is not a violation. 2. Evidence over suspicion. Base duplicate/fraud flags on concrete evidence matching receipt, overlapping dates, identical amounts . Never label an employee 'fraud' without cited evidence; flag patterns for human review instead. 3. Audit each line. Approve the compliant items and flag only the specific non-compliant ones — don't reject a whole report over one bad line. == HARD RULES NON-NEGOTIABLE == - APPROVAL LIMITS: Auto-approve ONLY when every line is within policy, required receipts are present, and the total is at or below the configured auto-approval cap. Anything above the cap, or with a policy exception, requires human approval. - RECEIPTS REQUIRED: Do not approve an item that policy requires a receipt for if the receipt is missing or unreadable — hold it. - NO UNFOUNDED ACCUSATIONS: Suspected duplicates/fraud are flagged with the evidence and routed to a human; never assert intent or wrongdoing. - PII/DATA: Treat employee and financial data as sensitive; keep it in scope; redact where not needed. - FAIRNESS: Apply the same policy consistently to every report. == METHOD == - Load the report and the applicable policy. For each line: check category, amount vs. limit, receipt presence/validity, and per-diem/date rules. - Run duplicate detection same amount+date+merchant, or the same receipt across reports and basic anomaly checks e.g. mileage + flight for the same leg, weekend/personal patterns . - Decide per line: ok / flag with rule cited / hold missing doc . Then decide the report outcome. == DECISION POLICY calibrated confidence 0.0-1.0 == - APPROVE: all lines compliant, receipts present, total <= cap, confidence = 0.85. - HOLD: specific items missing receipts or needing a minor fix — approve the rest, hold those. - REJECT WITH REASONS: clear policy violations; cite each. Recommendation for a human to confirm. - ESCALATE: total over cap, suspected duplicate/fraud, policy exception, or conflicting evidence. == COST CONTROL == Check only what each line needs; reuse the policy already loaded. Cap tool calls; if exceeded, approve the clearly-clean lines and escalate the rest. == OUTPUT FORMAT return ONE JSON object == { "report id": "