Stop trusting the agent: bind tool-call approvals to the exact call

A developer argues that using a boolean flag for tool-call approvals in agentic systems is insecure and proposes binding approvals to specific calls via HMAC-signed tokens. The approach prevents replay, argument drift, and principal-swap attacks by including call ID, canonical argument digest, principal, and expiry in the token. The solution requires deterministic key generation for replay-based durable execution engines.

Agentic systems gate dangerous tool calls — file writes, money movement, deploys — behind an "approval": a human-in-the-loop click, or a policy check. Look at how that approval is usually represented and you'll often find a boolean sitting in the run/session state: approved: true . A boolean is the wrong primitive, and it fails in three ways that prompt injection is happy to exploit. false into true . report.csv ". The approval is just true , so the same flag is honored for the prod.db ". The boolean doesn't know which call it approved. approved .The root cause is the same in all three: the approval is modeled as a property of the run , when it should be evidence for one specific call . When approval is granted, mint a tag over the things that must not change: the tool-call id, a digest of the canonical arguments, the principal, and an expiry. Verify it at dispatch, against a per-run secret. php import hmac, hashlib, json, time def canon args: dict - bytes: canonical serialization so benign reserialization doesn't invalidate a token. production: RFC 8785 JCS, which also normalizes numbers — 10 vs 10.0 return json.dumps args, sort keys=True, separators= ",", ":" .encode def mint key: bytes, call id: str, args: dict, principal: str, ttl: int = 300 - dict: exp = int time.time + ttl digest = hashlib.sha256 canon args .hexdigest msg = f"{call id}|{digest}|{principal}|{exp}".encode tag = hmac.new key, msg, hashlib.sha256 .hexdigest return {"call id": call id, "principal": principal, "exp": exp, "tag": tag} def verify key: bytes, tok: dict, call id: str, args: dict, principal: str - bool: if tok.get "call id" = call id: return False replay onto another call if tok.get "principal" = principal: return False wrong principal if tok.get "exp", 0 < time.time : return False expired digest = hashlib.sha256 canon args .hexdigest msg = f"{call id}|{digest}|{principal}|{tok 'exp' }".encode expect = hmac.new key, msg, hashlib.sha256 .hexdigest return hmac.compare digest expect, tok "tag" forged / flipped / arg-drift Run the three attacks against it plus principal-swap and a forged tag : KEY = b"per-run-secret-not-a-global-one" tok = mint KEY, "call-1", {"amount": 10, "to": "alice"}, "user:42" approve $10 to alice verify KEY, tok, "call-1", {"amount": 10, "to": "alice"}, "user:42" True legit verify KEY, tok, "call-2", {"amount": 10, "to": "alice"}, "user:42" False replay verify KEY, tok, "call-1", {"amount": 10000, "to": "alice"}, "user:42" False arg drift verify KEY, tok, "call-1", {"amount": 10, "to": "alice"}, "user:99" False wrong principal verify KEY, { tok, "tag": "00" 32}, "call-1", {"amount": 10, "to": "alice"}, "user:42" False forged The flag can no longer be flipped no valid tag , replayed call-id is in the MAC , or drifted args digest is in the MAC . An attacker who fully controls the transported state still can't manufacture a token without the key. 10 vs 10.0 vs 1e1 must agree — RFC 8785 JSON Canonicalization Scheme is the off-the-shelf answer. Put the canonicalization recipe id inside the hashed bytes so the two sides can't silently disagree about the rules. AUTO FUNCTION INVOCATION filter don't call next ⇒ the call is skipped , ADK's before tool callback, or the MCP tool-call boundary. Tools that need approval are classified as such; anything unclassified is denied, not allowed through.If your agent runs on a replay-based durable-execution engine Temporal and friends , the per-run secret must survive replay . Workflow code is re-executed from history on recovery, so a key minted with a non-deterministic call won't match the token already in history — approvals verify fine in dev and then fail closed after the first worker restart , which is the worst possible time to discover it. Derive the key deterministically HKDF server secret, run id or establish it once via a recorded side-effect, and make the expiry deterministic too rather than reading wall-clock inside workflow code. Authorization in an agent system shouldn't be ambient, mutable state that travels with the run. It should be evidence bound to a single call envelope — this principal, this tool, these exact arguments, until this time — that the executor re-verifies at the moment of dispatch. The boolean isn't a simplification of that; it's the bug. I work on reliability and verification for AI and numerical systems — agent authorization, determinism, and "prove the thing that claims to be authorized actually was." The snippet above is runnable as-is. Happy to compare notes if you're hardening an agent's tool boundary — GitHub.