{"slug": "my-agent-dry-ran-fine-in-staging-100-times-then-wrecked-production-on-the-first", "title": "My agent dry-ran fine in staging 100 times — then wrecked production on the first real run", "summary": "A developer building AI agents on Cloudflare Workers discovered that staging-to-production data bleeds caused production outages despite 100 successful dry runs. The engineer found that environment drift and non-deterministic agent execution paths led to fresh tool call sequences in production, while mock responses in dry-run mode created orphaned records when writes to D1 were not scoped. The fix involved propagating a dry-run flag via KV across all subsequent writes in the same run, and the developer warns that hook failures can bypass dry-run protection entirely.", "body_md": "A staging-to-production data bleed cost me 4 hours of rollback. That's what finally made dry-run a structural requirement, not an afterthought.\n\nThe common advice is: test in staging, promote when green. The problem is environment drift. My D1 schema changes once or twice a week, and a solo operator can't keep staging perfectly synchronized. Worse, agents don't have fixed execution paths — the same input can produce a different tool call sequence on the next run. I ran a flow 100 times in staging and still hit a fresh path on the first production execution.\n\nThe most surprising thing I learned after 6 months of running this: **latency wasn't the problem I expected**. KV writes averaged 12ms — basically imperceptible. The real problem was that mock responses fool the agent into treating skipped writes as real successes. I'd dry-run an R2 `put`\n\n, the agent would believe the file was uploaded, and then proceed to write metadata to D1 — which was *not* in dry-run scope. Real write, orphaned record.\n\nThe fix: once any write tool in a run hits dry-run, propagate a flag for that `runId`\n\nthat forces all subsequent writes in the same run to dry-run too.\n\n```\n// after intercepting first dry-run write\nawait ctx.env.KV.put(`dryrun_active:${ctx.runId}`, \"1\", {\n  expirationTtl: 3600,\n});\n\n// every subsequent hook checks this flag\nconst isDryRunActive =\n  (await ctx.env.KV.get(`dryrun_active:${ctx.runId}`)) === \"1\";\n```\n\nOne more thing that burned me: if the hook itself fails — say, KV goes temporarily unavailable — Claude Code's default behavior is fall-through. The tool call executes anyway, dry-run flag ignored. Last week a KV spike caused hook timeouts and 3 agents wrote directly to production. No data loss because those ops were idempotent, but it was luck. Hook failure needs its own alert, separate from agent failure.\n\nI wrote up the full breakdown — including the dry-run propagation edge cases, R2 + D1 orphan scenarios, and where this pattern completely falls apart (read-modify-write loops, APIs with side-effectful reads) — over on riversealab.com.", "url": "https://wpnews.pro/news/my-agent-dry-ran-fine-in-staging-100-times-then-wrecked-production-on-the-first", "canonical_source": "https://dev.to/riversea/my-agent-dry-ran-fine-in-staging-100-times-then-wrecked-production-on-the-first-real-run-10cc", "published_at": "2026-07-01 01:12:19+00:00", "updated_at": "2026-07-01 01:48:47.168929+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-infrastructure"], "entities": ["Cloudflare Workers", "D1", "R2", "KV", "Claude Code", "riversealab.com"], "alternates": {"html": "https://wpnews.pro/news/my-agent-dry-ran-fine-in-staging-100-times-then-wrecked-production-on-the-first", "markdown": "https://wpnews.pro/news/my-agent-dry-ran-fine-in-staging-100-times-then-wrecked-production-on-the-first.md", "text": "https://wpnews.pro/news/my-agent-dry-ran-fine-in-staging-100-times-then-wrecked-production-on-the-first.txt", "jsonld": "https://wpnews.pro/news/my-agent-dry-ran-fine-in-staging-100-times-then-wrecked-production-on-the-first.jsonld"}}