Expense Audit & Compliance Agent

A new AI agent, the Expense Audit & Compliance Agent, automates line-by-line expense report auditing against company policy, flagging violations and suspicious patterns while auto-approving clean reports. The agent is governed by an open-source AgentAz specification that enforces read-only access, cost and loop limits, and human handoff for violations, aiming to reduce manual review workload and improve compliance.

Overview Line-by-line audit against your actual policy: limits, categories, receipt rules, and per-diems — each flag cites the rule it breaks. Catches what manual review misses: duplicate submissions, out-of-policy items, and suspicious patterns across reports. Decides within limits: clean reports auto-approve; specific items are held for review; rejections and fraud signals go to a human. Defensive: no auto-approval over the cap or with missing receipts, and no fraud accusation without cited evidence. AgentAz™ specification A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime. Machine-readable contract agentaz.json , validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL: { "$schema": "./agentaz.schema.json", "version": "2.0.0", "last reviewed": "2026-06-24", "agent id": "expense-report-audit-agent", "trust level": "A2", "dna pattern": "Evaluation", "worst case action": "Flags an expense incorrectly for human review. Cannot approve, reject, or reimburse.", "authority boundary": "Audits expenses against policy and flags issues; no approval or payment tools present.", "tags": "finance", "expense-audit", "compliance", "read-only", "human-review" , "tool boundary": { "allowed tools": "read expense", "check policy", "detect anomaly", "flag violation" , "execution tools absent": true }, "output boundary": { "format": "structured json", "never emits": "expense approve", "expense reject", "payment" }, "cost boundary": { "max usd per trace loop": 0.25, "alert threshold usd": 0.16 }, "loop boundary": { "max reasoning turns": 8 }, "human handoff": { "triggers": "policy violation", "anomaly", "low confidence" , "destination": "finance review" }, "audit": { "append only": true, "logs": "flags", "policy refs" } } New to this? Read the AgentAz specification guide /agentaz-specifications — Trust Levels, DNA patterns, and how it complements your runtime. AgentAz™ is open source under Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 — schema frozen v1.0.0 and source on GitHub https://github.com/agent-kits/agentaz . Governance matrix A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality. | Agent goal | Bounded by the authority spec above | |---|---| | Trust Level | A2 — Recommend | | Tool access | Least privilege — execution tools absent read-only | | Context handling | Grounded in provided inputs; cites or flags rather than guessing | | Memory strategy | Task-scoped; no persistent cross-session memory | | Human approval | Required on policy violation, anomaly, low confidence → finance review | | Audit trail | Append-only log flags, policy refs | | Cost & loop bounds | ≤ $0.25 per loop · ≤ 8 reasoning turns | | Recovery / escalation | Escalates to finance review | Agent component mapping A framework-neutral view of how this blueprint maps to standard agent-architecture components the vocabulary common to ADK-style frameworks . It describes structure for clarity — not an official integration or certified compatibility. | Agent | Primary reasoner — Recommend authority A2 | |---|---| | Tools | read expense, check policy, detect anomaly, flag violation — execution tools absent read-only | | Memory | Task-scoped working context; no persistent cross-session memory | | Guardrails | Worst-case classified A2 ; no execution tools; ≤ $0.25/loop · ≤ 8 turns | | Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned | | Handoff | Escalates to finance review on policy violation, anomaly, low confidence | Failure modes Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly. Misses a genuine policy violation false negative . - Detection - Every report is screened against the full policy set, not sampled. - Mitigation - Positioned as full-coverage screening with a human deciding exceptions. - Recovery - The missed rule is added post-audit and the report can be re-screened. Flags a compliant expense as a violation false positive . - Detection - Each finding carries confidence and cites the policy clause. - Mitigation - Findings are recommendations a human approves; it never auto-rejects. - Recovery - The approver clears it and the rule is tuned. A receipt is fabricated or altered. - Detection - The agent flags anomalies but never asserts authenticity. - Mitigation - A human verifies authenticity. - Recovery - Suspicious items are escalated to finance. Evaluation Violation recall is what matters — missing a genuine policy breach is the failure — against a tolerable false-positive rate. | Violation recall | Of genuine policy violations, the share it catches. | |---|---| | Precision | Of items flagged, the share that are real violations — noise resistance. | | Policy coverage | Share of policy rules actually exercised by the screen. | | Citation accuracy | Whether each flag cites the correct policy clause. | | Latency | Time to audit a report. | Recommended approach. Build a set of expense reports annotated against the full policy, with seeded violations and compliant edge cases; measure recall and precision and verify each flag cites the right clause. Include altered-receipt cases to confirm it flags rather than asserts authenticity. When to use Use it when - Finance/AP reviews a high volume of expense reports and most of the work is policy-checking and receipt-matching. - You have a written expense policy the agent can audit against and access to receipts/report data. - You want consistent, documented audits with an approval trail for compliance. - You want to auto-clear clean reports and surface only the genuine exceptions and fraud signals to humans. Avoid it when - You have no written, structured policy for the agent to audit against. - You expect it to make final fraud or termination determinations — those are human decisions. - You can't give it receipt/report access to actually verify line items. - You are unwilling to keep approval gates on large amounts and rejections. System prompt You are an Expense Audit Agent in a finance operation. You audit ONE expense report against the company's written policy and decide: approve, hold specific items, reject, or escalate. You are judged on catching real policy violations and fraud, fairness and accuracy, and never approving spend you shouldn't or accusing someone without evidence. == CORE PRINCIPLES == 1. Policy-grounded. Every flag must cite the specific policy rule it violates limit, category, receipt requirement, per-diem . Do not invent rules or violations; if the policy is silent, it is not a violation. 2. Evidence over suspicion. Base duplicate/fraud flags on concrete evidence matching receipt, overlapping dates, identical amounts . Never label an employee 'fraud' without cited evidence; flag patterns for human review instead. 3. Audit each line. Approve the compliant items and flag only the specific non-compliant ones — don't reject a whole report over one bad line. == HARD RULES NON-NEGOTIABLE == - APPROVAL LIMITS: Auto-approve ONLY when every line is within policy, required receipts are present, and the total is at or below the configured auto-approval cap. Anything above the cap, or with a policy exception, requires human approval. - RECEIPTS REQUIRED: Do not approve an item that policy requires a receipt for if the receipt is missing or unreadable — hold it. - NO UNFOUNDED ACCUSATIONS: Suspected duplicates/fraud are flagged with the evidence and routed to a human; never assert intent or wrongdoing. - PII/DATA: Treat employee and financial data as sensitive; keep it in scope; redact where not needed. - FAIRNESS: Apply the same policy consistently to every report. == METHOD == - Load the report and the applicable policy. For each line: check category, amount vs. limit, receipt presence/validity, and per-diem/date rules. - Run duplicate detection same amount+date+merchant, or the same receipt across reports and basic anomaly checks e.g. mileage + flight for the same leg, weekend/personal patterns . - Decide per line: ok / flag with rule cited / hold missing doc . Then decide the report outcome. == DECISION POLICY calibrated confidence 0.0-1.0 == - APPROVE: all lines compliant, receipts present, total <= cap, confidence = 0.85. - HOLD: specific items missing receipts or needing a minor fix — approve the rest, hold those. - REJECT WITH REASONS: clear policy violations; cite each. Recommendation for a human to confirm. - ESCALATE: total over cap, suspected duplicate/fraud, policy exception, or conflicting evidence. == COST CONTROL == Check only what each line needs; reuse the policy already loaded. Cap tool calls; if exceeded, approve the clearly-clean lines and escalate the rest. == OUTPUT FORMAT return ONE JSON object == { "report id": "<id ", "decision": "APPROVE|HOLD|REJECT WITH REASONS|ESCALATE", "confidence": <0.0-1.0 , "total usd": <number , "line findings": { "item": "<line ", "status": "ok|flag|hold", "rule": "<policy rule cited, or empty ", "note": "<short " } , "fraud signals": "<evidence-based pattern, or empty " , "approved amount usd": <number , "actions": { "tool": "<tool ", "args": { ... }, "requires approval": <bool } , "employee note": "<neutral, factual; no accusation ", "escalation": { "needed": <bool , "reason": "<cap/fraud/exception, or empty " } } If evidence is mixed, prefer HOLD or ESCALATE over REJECT, and never accuse without cited evidence. Simulate run Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser. Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run. Setup guide Install and connect your expense system Install the agent and connect it to your expense/AP platform. pipx install expense-audit-agent expense-audit-agent connect --system concur expense-audit-agent doctor Configure limits and mode The auto-approval cap and receipt rules are enforced deterministically, not by the model. cp .env.example .env ANTHROPIC API KEY=sk-ant-... AUTO APPROVE CAP USD=250 REQUIRE RECEIPT OVER USD=25 MODE=assist assist recommend | act auto within cap Load your expense policy Provide the structured policy the agent audits against. This is the only basis for flags. policy.yml limits: { meals: 60, hotel per night: 300, mileage per mile: 0.67 } receipt required over: 25 disallowed: "alcohol over limit", "personal", "first class without approval" per diem: { domestic: 75 } Backtest on past reports Replay audited reports to compare the agent's findings to actual outcomes before going live. expense-audit-agent backtest --range 90d --explain reports approve/flag accuracy and any missed violations Wire into the approval flow Route submitted reports to the agent. Start in assist mode, enable auto-approval within the cap once backtests are clean. php submission webhook - POST https://your-host/expense/audit HMAC promote MODE=act for within-cap clean reports Architecture Tools required Workflow 1. Intake the report Load the report, receipts, and submitter context; load the applicable policy. 2. Audit each line Check category, amount vs. limit, receipt presence/validity, and per-diem/date rules for every line, citing the rule on any flag. 3. Verify receipts Confirm required receipts are present and readable and match the line; hold items that lack required proof. 4. Detect duplicates & anomalies Run duplicate detection and pattern checks across this and prior reports, gathering evidence rather than asserting intent. 5. Decide per line and report Approve compliant lines, hold those missing docs, flag violations with the rule, and decide the report outcome within the cap. 6. Act through the gate Auto-approve within limits; route over-cap totals, rejections, and fraud signals to a human with the evidence. 7. Record the trail Log each decision with the cited rule and outcome for compliance, and feed overrides back to improve the checks. Examples Clean report within policy → auto-approve A standard travel report: every line within limits, receipts attached, total under the cap. Input Report EXP-3310, total $214: hotel $180 receipt , taxi $19 receipt , lunch $15 receipt . Policy: hotel<=300/night, meals<=60, receipts over $25. Output { "report id": "EXP-3310", "decision": "APPROVE", "confidence": 0.93, "total usd": 214, "line findings": { "item": "Hotel $180", "status": "ok", "rule": "hotel per night<=300", "note": "within limit, receipt present" }, { "item": "Taxi $19", "status": "ok", "rule": "", "note": "under receipt threshold, allowed" }, { "item": "Lunch $15", "status": "ok", "rule": "meals<=60", "note": "within limit" } , "fraud signals": , "approved amount usd": 214, "actions": { "tool": "approve expense", "args": { "report id": "EXP-3310", "amount usd": 214 }, "requires approval": false } , "employee note": "Your expense report EXP-3310 $214 has been approved.", "escalation": { "needed": false, "reason": "" } } Note: Every line is checked against the cited policy rule, receipts are present, total is under the $250 cap — so it auto-approves and documents the rule for each line. This is the volume the kit clears instantly so humans only see exceptions. Out-of-policy item + missing receipt → hold the line, approve the rest Mostly clean, but one meal is over the limit and a $40 item has no receipt. The agent holds those two lines and approves the rest. Input Report EXP-3402, total $268: hotel $190 receipt , dinner $92 receipt , supplies $40 NO receipt . Policy: meals<=60, receipts required over $25. Output { "report id": "EXP-3402", "decision": "HOLD", "confidence": 0.9, "total usd": 268, "line findings": { "item": "Hotel $190", "status": "ok", "rule": "hotel per night<=300", "note": "within limit" }, { "item": "Dinner $92", "status": "flag", "rule": "meals<=60", "note": "$32 over the per-meal limit; needs approval or adjustment" }, { "item": "Supplies $40", "status": "hold", "rule": "receipt required over=25", "note": "no receipt attached for a $40 item" } , "fraud signals": , "approved amount usd": 190, "actions": { "tool": "approve expense", "args": { "report id": "EXP-3402", "lines": "hotel" , "amount usd": 190 }, "requires approval": false } , "employee note": "Approved the hotel $190 . Two items need attention: the dinner $92 exceeds the $60 meal limit, and the $40 supplies item needs a receipt. Please add the receipt and confirm the meal overage with your manager.", "escalation": { "needed": false, "reason": "" } } Note: Instead of rejecting the whole report over two lines, the agent approves the compliant hotel and holds exactly the two problem items — each with the cited rule. The employee note is specific and actionable, not punitive. Duplicate receipt across reports → escalate with evidence, no accusation A receipt submitted on this report also appears on a prior one. The agent flags the duplicate with evidence and escalates — without accusing the employee. Input Report EXP-3450: flight $610 with receipt A-7782. duplicate check finds receipt A-7782 already reimbursed on EXP-3119 last month. Output { "report id": "EXP-3450", "decision": "ESCALATE", "confidence": 0.8, "total usd": 610, "line findings": { "item": "Flight $610", "status": "flag", "rule": "duplicate submission", "note": "receipt A-7782 matches one already reimbursed on EXP-3119 2026-05 " } , "fraud signals": "Identical receipt A-7782 $610 appears on EXP-3119 already reimbursed and EXP-3450 — possible duplicate submission" , "approved amount usd": 0, "actions": { "tool": "escalate to finance", "args": { "queue": "duplicate review", "evidence": "EXP-3119", "EXP-3450", "receipt A-7782" }, "requires approval": false } , "employee note": "We're reviewing report EXP-3450; the flight receipt appears to match one already reimbursed. Finance will follow up — this may simply be an accidental re-submission.", "escalation": { "needed": true, "reason": "Possible duplicate reimbursement — same receipt on two reports." } } Note: The defining defensive case: the agent has concrete evidence same receipt number on two reports but treats it as a possible duplicate to review, not proven fraud. It escalates with the evidence, holds the $610, and the employee note explicitly allows for an honest mistake. Evidence and fairness, never accusation. Implementation notes - Enforce the auto-approval cap and receipt requirements in a deterministic gate; the model audits, the gate controls what can be approved without a human. - Cite the specific policy rule on every flag. A finding without a rule is an opinion, not an audit — and citations make the trail defensible. - Treat duplicates and anomalies as evidence to review, never as proven fraud; route them to a human and keep employee-facing language neutral. - Audit per line and approve the compliant parts — rejecting whole reports over a single bad line creates friction and rework. - Backtest against historically audited reports and track missed-violation and false-flag rates before enabling auto-approval. - Keep employee and financial data in scope with PII discipline, and apply the policy identically to everyone for fairness and audit. - Reserve the strong model for anomaly judgment and the report decision; a cheaper model can match receipts and categorize lines. Variations Basic Audit & flag assistant Audits each line against policy, verifies receipts, and returns flagged items with the cited rule and a recommendation for a reviewer. No auto-approval. Advanced Guarded auto-approval Auto-approves clean reports within the cap, holds specific non-compliant lines, runs duplicate/anomaly detection, and escalates fraud signals and over-cap totals. Enterprise Governed spend audit Adds multi-policy support, ERP/AP integration, full audit trails and SLAs, fraud-pattern analytics across employees, and check tuning from reviewer outcomes. Download the Agent Blueprint Download Blueprint .zip /downloads/expense-report-auditor.zip Export View the source on GitHub https://github.com/agent-kits/agentaz/tree/main/kits/expense-report-auditor This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 code & schema and CC‑BY‑4.0 text . Frequently asked questions Only when every line is within policy, required receipts are present, and the total is within your configured cap. Anything over the cap, missing a required receipt, or showing policy exceptions is held or escalated to a human. It audits each line against your structured written policy and cites the specific rule on every flag. If the policy is silent on something, it isn't treated as a violation — no invented rules. No. It surfaces evidence-based patterns like a duplicate receipt and routes them to a human for review with the evidence attached, keeping employee-facing language neutral. It never asserts intent or wrongdoing. It approves the compliant lines and holds only the specific problem items, with the cited rule and what's needed to fix them — rather than rejecting the whole report. It checks for the same receipt, or the same amount/date/merchant, across the current and prior reports, and flags genuine matches as possible duplicates for human review. Start in assist mode where it only recommends, backtest against historically audited reports, then enable auto-approval for clean within-cap reports once the results hold up.