cd /news/ai-agents/why-your-ai-agent-is-flaky-and-7-rul… · home topics ai-agents article
[ARTICLE · art-41975] src=dev.to ↗ pub= topic=ai-agents verified=true sentiment=· neutral

Why your AI agent is flaky — and 7 rules that make it reliable

A developer shares seven rules for making AI agents reliable in production, arguing that flakiness is usually a specification problem rather than a model problem. The rules include making success criteria falsifiable, using unambiguous tool names, providing structured error messages, setting hard limits like max step count and wall-clock timeout, and gating dangerous actions in code. Concrete guardrails for Claude Code are provided, such as blocking edits to sensitive paths and destructive commands.

read4 min views1 publishedJun 27, 2026

You built an AI agent. In the demo it was magic. In the wild it loops, hallucinates a tool call, "forgets" the format you asked for twice, and occasionally does something mildly alarming with your filesystem.

Here's the uncomfortable truth after shipping a lot of these: a flaky agent is almost never a model problem. It's a specification problem. The model is doing exactly what your prompt, your tools, and your control loop told it to do — which turns out to be far less than you thought you said.

Below are seven rules that consistently move an agent from "cool demo" to "I trust this on real work." None of them require a bigger model. Three of them are paste-able guardrails for Claude Code specifically.

"Summarize this well" is unfalsifiable. The agent can't tell when it's done, and neither can you. Replace every vibe with something a script could check:

The reliability win isn't the format — it's that failure becomes detectable. If you can write an assert

for the output, you can build a retry around it. If you can't, you have no idea how often it's wrong.

Most "the agent called the wrong tool" bugs are actually "the tool description was ambiguous" bugs. A tool named search

that sometimes means web search and sometimes means database lookup will get confused. Two rules:

search_web

, not search

)."error: 'date' must be YYYY-MM-DD, got '6/27'"

beats a stack trace or a silent null

. The model can recover from a sentence; it cannot recover from undefined

.Every autonomous agent needs three hard limits or it will eventually run forever: a max step count, a wall-clock timeout, and a no-progress detector (if the last two actions are identical, stop). The model will not reliably stop itself. This is your job, in code, outside the prompt.

Anything you put in the prompt is a request. The model usually honors it. "Usually" is not a security model. If an action is dangerous or irreversible, gate it in code, not in English.

Concretely, in Claude Code you can do this with hooks — deterministic scripts that run before a tool call and can block it. Here are the three I put in almost every project.

Guardrail A — block edits to sensitive paths. A PreToolUse

hook that refuses any write to .env

, .git/

, or secrets:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "node -e \"const p=process.env.CLAUDE_TOOL_INPUT_FILE_PATH||''; if(/(^|\\/)\\.env|(^|\\/)\\.git\\/|secrets/i.test(p)){console.error('Blocked: protected path '+p);process.exit(2)}\""
          }
        ]
      }
    ]
  }
}

Exit code 2

tells Claude Code the action was blocked and feeds the reason back to the model — so it adjusts instead of silently failing.

Guardrail B — never let "rm -rf" through. A PreToolUse

matcher on Bash

that hard-fails on obviously destructive commands:

{
  "matcher": "Bash",
  "hooks": [{
    "type": "command",
    "command": "node -e \"const c=process.env.CLAUDE_TOOL_INPUT_COMMAND||''; if(/rm\\s+-rf|git\\s+reset\\s+--hard|:\\(\\)\\{/.test(c)){console.error('Blocked destructive command');process.exit(2)}\""
  }]
}

Guardrail C — auto-format after every edit, so the agent's output is consistent without you asking each time:

{
  "matcher": "Edit|Write",
  "hooks": [{ "type": "command", "command": "prettier --write \"$CLAUDE_TOOL_INPUT_FILE_PATH\" 2>/dev/null || true" }]
}

The pattern that matters here: the model proposes, deterministic code disposes. Hooks run whether or not the model "felt like" honoring an instruction.

If your agent's "memory" is just the growing chat transcript, it will drift — early instructions get diluted, and cost climbs every turn. Keep the durable facts (the task, the constraints, what's been done) in a small structured object you re-inject each step, and let the transcript be disposable. An agent that re-reads its own goal every loop stays on task far longer than one trusting a 40-message context to hold.

You wouldn't ship a function with zero tests; don't ship an agent with zero either. Build a tiny eval set — even 10–15 representative inputs with checkable expected properties (rule #1 makes this possible). Run it on every prompt change. The first time a "harmless" prompt tweak silently breaks 3 of 12 cases, you'll understand why this is the highest-leverage 30 minutes in the whole project.

When the agent is uncertain or a guardrail trips, the worst outcome is confidently shipping a wrong answer. Design an explicit "I'm not sure, here's why" path. A reliable agent that occasionally says "I couldn't verify X, stopping" earns more trust than a confident one that's wrong 5% of the time on work that matters.

Reliability is not a model you buy; it's a discipline you impose. Make success checkable (#1, #6), make tools unambiguous and loud (#2), bound the loop (#3), enforce the dangerous stuff in code rather than in prose (#4), externalize state (#5), and fail loud to a human (#7). Do those and an ordinary model behaves like a dependable one.

If you want this as a printable checklist plus three more ready-to-paste Claude Code guardrails, I wrote a free field guide — no email required: ** Ship a Reliable AI Agent — the Free Field Guide**. It's the 10-minute version of everything above.

What's the failure mode that bit you hardest building agents? I collect these — the patterns repeat more than you'd think.

── more in #ai-agents 4 stories · sorted by recency
── more on @claude code 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-your-ai-agent-is…] indexed:0 read:4min 2026-06-27 ·