Your AI Agent Doesn't Need to Be Smarter. It Needs to Be Idempotent

wpnews.pro

Most of the failures I see in production AI agents aren't reasoning failures. The model picks the right tool, fills in the right arguments, and makes a perfectly sensible decision. Then the agent charges the customer twice.

The reason is mundane and has nothing to do with intelligence. A write-capable agent — one that can send an email, create a ticket, move money, or update a database — lives inside the same unreliable network as any other distributed system. Requests time out. Connections drop after the server already committed the write but before the response came back. An orchestration framework retries a step that looked like it failed but didn't. And because the agent is a loop that re-plans on every observation, a single ambiguous outcome can send it down the path of just trying the action again.

In a read-only agent, a retry is free. In a write-capable agent, a retry is a second irreversible action in the real world. That asymmetry is the whole game, and the fix is older than LLMs: idempotency.

Here's the sequence that bites teams over and over. The agent calls send_invoice

. The downstream service receives it, creates the invoice, and starts sending the response. Somewhere on the way back, the connection dies. From the agent's point of view, the call failed — it got a timeout, not a 200. So the agent, doing exactly what a resilient system is supposed to do, retries. Now there are two invoices.

Notice that nothing here is the model's fault. You could swap in a smarter model and the bug gets worse, because a more capable agent is more aggressive about recovering from apparent failures. The intelligence layer and the reliability layer are different problems, and you cannot prompt your way out of a network partition.

Payments infrastructure solved this years ago, and the solution is worth copying wholesale. Stripe's API lets a client attach an Idempotency-Key

header to any POST request. Per Stripe's API reference, the server saves the status code and body of the first request made for a given key, and subsequent requests with the same key return that same stored result — even if the original was a failure. Stripe recommends a V4 UUID or another random string with enough entropy to avoid collisions, and notes that keys can be pruned automatically once they're at least 24 hours old.

The mechanism is simple, but the insight is the part to internalize: the safety guarantee lives at the boundary, keyed on the caller's stated intent, not on the model's judgment. The agent is allowed to be flaky. The boundary is what makes flakiness safe.

For an agent, the only adaptation is where the key comes from. A human checkout flow generates one fresh key per user click. An agent has no clicks — so you derive the key from the content of the intended action. Same logical action, same key, every time, even across retries and process restarts.

Here's the entire idea in runnable Python. An IdempotentStore

wraps the side-effecting action; the key is a hash of the tool name plus its parameters, so a retried call collapses onto the original.

import hashlib, json

class IdempotentStore:
    def __init__(self):
        self._results = {}
        self.side_effects = 0  # times the REAL action ran

    def run(self, key, action, *args):
        if key in self._results:
            return self._results[key], "replayed"   # no downstream call
        result = action(*args)                       # the irreversible part
        self.side_effects += 1
        self._results[key] = result
        return result, "executed"

def intent_key(tool_name, params):
    payload = json.dumps({"tool": tool_name, "params": params}, sort_keys=True)
    return hashlib.sha256(payload.encode()).hexdigest()[:16]

Drive it with an agent that retries the same logical charge three times:

store = IdempotentStore()
params = {"customer": "cus_42", "cents": 4999}
key = intent_key("charge_customer", params)

for attempt in range(3):
    result, mode = store.run(key, charge_customer,
                             params["customer"], params["cents"])
    print(f"attempt {attempt+1}: mode={mode}")

Running it prints executed

once and replayed

twice, and the downstream system records exactly one charge. The agent still thinks it acted three times — and that's fine. Its job is to decide; the store's job is to make sure deciding twice doesn't cost twice.

In real systems you'd back _results

with Redis or a Postgres table (with a unique constraint on the key, so even two concurrent workers race safely), set a TTL, and store enough of the response to replay it faithfully. The structure stays the same.

The hash-the-params trick has a sharp edge worth naming. Your key is only as good as your definition of "the same action."

If two genuinely distinct actions hash to the same key, you've created a false duplicate and the second one silently no-ops — a send_reminder

that quietly never sends. If two retries of the same action hash to different keys — because you included a timestamp, a freshly generated request ID, or the model rephrased a free-text field — your guard does nothing and the double-write sails through. The model's nondeterminism makes this trap easy to fall into: ask an LLM to "email the customer about their late payment" twice and you may get two different message bodies, and therefore two different keys.

The fix is to key on the stable part of the intent — the customer ID, the invoice ID, the logical operation — and deliberately exclude anything the model might reword or anything that varies per call. Treat the key as a first-class part of your tool's contract, designed by you, not as an incidental hash of whatever arguments happened to show up.

Before you reach for a bigger model, a longer prompt, or another layer of self-reflection, ask a cheaper question: if my agent does this exact action twice, what breaks? For every write-capable tool, the answer should be "nothing," and the way you get there is an idempotency key derived from intent and enforced at the boundary.

Reliability in agents isn't mostly about making better decisions. It's about making the cost of a repeated decision zero. Get that right and you can let the agent be as flaky as the network it lives on — which it will be, whether you plan for it or not.

source & further reading

dev.to — original article I Got Tired of Rewriting AI API Wrappers, So I Built a Gateway Humanizing Artificial Intelligence for Log Analysis: Turning Raw Server Logs Into Clear DevOps Answers We Built an AI Receptionist. The Hard Part Wasn't Making It Sound Human.

Your AI Agent Doesn't Need to Be Smarter. It Needs to Be Idempotent

Run your AI side-project on zahid.host