{"slug": "gates-earned-from-failure-a-cost-test-for-agent-guardrails", "title": "Gates Earned From Failure: a cost test for agent guardrails", "summary": "A developer building an agent-based project describes a pragmatic approach to guardrails: each check in the toolchain was added only after a real failure occurred, not designed upfront. The method, illustrated by a glossary drift caught by reading and a rename that left residue in specification documents, ensures that every rule is earned and paid for by an actual incident. The developer argues that this failure-first ordering prevents bloated instruction files and keeps the system lean and effective.", "body_md": "*Every guardrail in my agent-built project was earned from a real failure, not designed up front. A cost test for when to build one, when to wait, and when to retire it.*\n\nOn a Wednesday in late May I caught a bug by reading. The project's glossary — the\n\ncanonical list of the domain terms my coding agent is required to use — had drifted\n\nfrom the domain model I actually carried in my head. Nothing flagged it. No test\n\nfailed, no check fired, no compiler complained. I noticed because I happened to be\n\nreading the file and the words were wrong.\n\nWhat I typed next is the whole argument in one line. Not \"fix the glossary,\" not\n\n\"I'll be more careful,\" but a question aimed at the toolchain instead of the error:\n\nthe GLOSSARY already drifted away from my domain model vision and I would like\n\nto prevent this in future refactors\n\nThat reframes the task. The drift stops being an error to correct once\n\nand becomes a signal about the toolchain: **if I caught a class of drift by\nreading, an enforcement axis is missing.** The fix isn't to read harder next time.\n\nI did not sit down and design a governance system, and that's the most transferable\n\nthing here. Every check in this project — a couple dozen now, wired as pre-commit\n\nhooks — was born from a particular drift I'd already been bitten by. The ordering is\n\nthe point. The failure came first; the gate came second, shaped to the exact failure.\n\nThe clearest place this showed up was renaming. (I told the measurement side of\n\nthis story in a companion piece, [ Rails, Not Rules](https://vasyltretiakov.dev/p/rails-not-rules):\n\nThe detail that stung was where the residue hid. In one rename, the leftover\n\ninstances of the old term weren't in some forgotten corner of the codebase. They\n\nwere in the specification documents, the very files being rewritten to *drive* the\n\nrename in the first place. The agent, under my direction, was editing the spec to\n\ndeclare the new vocabulary while leaving the old vocabulary scattered through the\n\nsame spec. The document asserting the rename was itself out of compliance with it.\n\nYou cannot out-discipline that, and watching the diff doesn't save you either: no\n\namount of review reliably catches the contradiction being introduced in the very\n\npass that's concentrating on the change. Only a check that reads the document back\n\nto you after the fact will, and that's what eventually went in.\n\nIt helps to be clear about why renames recur at all, because it isn't sloppiness.\n\nThe glossary is the written home of what Domain-Driven Design (DDD) calls the\n\n*ubiquitous language* (UL): the single vocabulary that runs from a domain expert's\n\nsentence to a class name. Eric Evans is emphatic that this language is not written\n\nonce and frozen; it is *continuously distilled*, sharpened every time the team's\n\nunderstanding improves. Each sharpening is a rename. So the renames are the system\n\nworking as designed, not a mess to stamp out — which is exactly why the residue\n\nproblem is permanent and worth a gate rather than a resolution to be tidier.\n\nAfter that, a rename was never just \"use the new word now.\" It became three things\n\nthat ship together: rename the term, add a glossary entry recording the old word as\n\n*retired*, and add a check keyed to that retired word so it can't silently return.\n\nThe rename and the rail land in the same commit. Skip the rail and you're trusting\n\nthat every future sweep will be perfect, which is the assumption that just failed.\n\nSo far this is a tidy story: let each gate be earned by a real, observed drift,\n\nand your instruction file never bloats into the 200-line document nobody, human or\n\nmodel, can hold in their head. A rule you add from imagination is a guess about a\n\nfailure that may never come. A rule you add from a failure you actually hit is paid\n\nfor. It has a body behind it. The failure log is the design document.\n\nI believed that cleanly for about a month. Then I had to argue with my own agent\n\nabout it, and the rule came out more interesting than it went in.\n\nThe trouble with \"wait for the drift\" is that some drifts you cannot afford to\n\nobserve even once. If the first occurrence is a deleted production table, or\n\ncustomer data in a log line, or a fabricated citation in something already\n\npublished, \"let it go wrong cheaply once\" is a contradiction. There is no cheap\n\nonce.\n\nI put this to the agent as a proposed amendment: build a gate proactively when the\n\nfailure mode is already well-attested *and* the check is cheap and\n\nlow-false-positive, or when the first occurrence is expensive or irreversible. Stay\n\nreactive when the rule is judgment-heavy, the convention is still in flux, or the\n\ndrift is cheap to catch and fix once. The reasoning is mundane risk math, not\n\nideology: you pre-pay for a gate exactly when the expected cost of waiting exceeds\n\nthe standing cost of the gate.\n\nThis repo runs on that amendment, and I can point at two receipts. The writing\n\nproject, the one these essays live in, has a check that blocks any cited URL not\n\nlogged in an evidence file. I built it *before* it had earned itself in the usual\n\nreactive way, because the failure it guards is the expensive kind and it had\n\nalready happened once: a fabricated citation reached a draft. A published\n\nfabrication doesn't get a cheap first occurrence. So the gate went in proactively,\n\nlicensed by the cost test: is the failure high enough cost, and is the check free\n\nof false positives?\n\nThe second receipt is the other half of the amendment, the well-attested-and-cheap\n\ncase rather than the irreversible one. The same writing project runs a prose check\n\non these very drafts, flagging the small set of mechanical tells that mark\n\nmachine-generated text. That failure mode wasn't hypothetical and it wasn't\n\nexpensive-once; it was a known, recurring, deterministic pattern I'd have to\n\ncorrect on every essay forever. Well-attested, cheap to detect, low false-positive:\n\nthat's the second carve-out, and it's the reason this paragraph can't lean on more\n\nthan two em-dashes without the gate stopping the commit. The check is the essay's\n\nown argument applied to the essay. The rule \"gates are earned\" survives both times,\n\nwith named carve-outs for failures you can't afford to rehearse and failures you've\n\nalready seen enough times to name precisely.\n\nBetween \"earn it from a failure\" and \"stay reactive\" sits the move that does most\n\nof the real work, and it's the one I'd most want a reader to steal. A lot of rules\n\nlook un-gateable because the thing they protect is a *meaning*, and a script can't\n\nread meaning. \"Don't restate the protocol prose in two places\" isn't grep-able.\n\nNeither is \"don't leave a dangling reference.\" The reflex is to give up and write\n\nthe rule into an instructions file as a wish. The better move is to stop checking\n\nthe rule and check a *structural proxy* for it: a syntactic shadow the real rule\n\ncasts, one a script can see and that almost never flags an innocent.\n\nIt works more often than it has any right to. \"Don't duplicate the protocol in\n\nprose\" became \"no narrative text in column zero of these files,\" because the\n\nprotocol lived in indented blocks and unindented prose was the tell. \"No dangling\n\nreferences\" became \"every `TODO(`\n\ncarries a live task slug.\" The em-dash rule\n\nguarding these very drafts is the same trick: I can't gate \"don't write like a\n\nlanguage model,\" but I can gate \"no paragraph carries more than two em-dashes,\"\n\nwhich is a measurable shadow of the thing I actually want. Each time, a rule I'd\n\nhave sworn was judgment-only turned out to cast a shape a grep could catch.\n\nThe proxy is never the rule, though. A structural shadow is an approximation by\n\nconstruction, so every proxy gate ships with a built-in gap between what it checks\n\nand what you wish it proved. That gap is the next failure.\n\nHere is the part I got wrong, and the agent caught.\n\nI had been treating \"cheap to write\" as most of what makes a gate worth building.\n\nThe agent pushed back on two fronts. First, cheap has to mean cheap to *maintain*,\n\nnot cheap to write. A gate is a standing liability, not a one-time cost. Every\n\nlegitimate evolution of the convention now has to route through the check, and you\n\npay an update tax and a false-positive-triage tax for the life of the gate. The\n\nauthoring cost is the small number.\n\nThe second point is the one that changed how I think. The agent put it more\n\nsharply than I had:\n\nOnce a green gate exists, people stop eyeballing the thing the gate appears to\n\ncover. So a cheap-but-approximate gate over a real invariant can be worse than\n\nno gate — it converts active vigilance into misplaced trust.\n\nThe dominant hidden failure of a cheap gate isn't noise. It's false confidence. A\n\ncheap check usually only *approximates* the invariant you care about. A text\n\nsearch proves a token is\n\nabsent; it does not prove a concept was actually renamed everywhere it lives. A\n\nfast compile passes against a stale cache. A sub-agent reports \"tests passed\" and\n\nyou don't re-run them. In every case the green checkmark covers less than it\n\nappears to.\n\nAn outside read lands in the same place. Birgitta Böckeler, whose harness\n\nvocabulary I lean on below, notes that test feedback is weaker than it looks once\n\n*\"the agent also generated the tests.\"* A verifier the generator authored is not\n\nindependent of the thing it checks, so a passing suite certifies less than it\n\nseems to — the same gap, one layer up.\n\nThe rename gate from earlier is a perfect specimen — a structural proxy of exactly\n\nthe kind I just praised, which makes it the more humbling, because it's one I'm\n\nproud of. It greps for the retired word and passes when the word is gone. But \"the\n\nold word is absent\" and \"the concept was correctly migrated\" are not the same claim. You can delete every instance of the old term and still\n\nhave left the *idea* it named half-translated, split across two new words that\n\nshould have been one, or attached to the wrong entity. The gate is green and the\n\nmigration is wrong, and now nobody's reading the diff with the old suspicion,\n\nbecause the check says it's handled. The check is real and worth having. It just\n\nproves a narrower thing than its green checkmark advertises, and the discipline is\n\nto keep knowing the difference instead of letting the green absolve you.\n\nThat's the trap in one move. You were watching the thing carefully when nothing\n\nclaimed to watch it for you. Now something claims to, so you look away, and the gap\n\nbetween *what the check proves* and *what it looks like it proves* is exactly where\n\nthe next bug lives. When those two diverge and the invariant matters, that argues\n\nagainst the cheap gate, not for it.\n\nThis isn't an argument against gates. It's an argument for knowing which of your\n\ngreen checkmarks are load-bearing and which are decorative.\n\nThe other thing I had been collapsing was the choice itself. I'd been treating it\n\nas binary: gate the thing, or stay reactive and fix it by hand when it breaks.\n\nThere are at least four rungs, and most failures want one of the middle two.\n\nThe distinction isn't mine — manufacturing got there decades ago. Shigeo Shingo's\n\n[poka-yoke](https://en.wikipedia.org/wiki/Poka-yoke), the mistake-proofing of the\n\nToyota Production System, already splits a *control* that makes the error\n\nimpossible from a *warning* that only signals it. The two middle rungs below are\n\nthose two ideas wearing software clothes. Birgitta Böckeler's\n\n[harness engineering](https://martinfowler.com/articles/harness-engineering.html)\n\ngives the orthogonal axis: *guides* that steer the agent before it acts, *sensors*\n\nthat observe after. A blocking gate is a sensor with teeth; a default path is a\n\nguide that leaves nothing to sense.\n\nThe strongest rung isn't a gate at all. It's the **default path** — Shingo's\n\ncontrol type: make the wrong artifact hard to author in the first place. A\n\ntemplate, a generator, a structured section that only has slots for the right\n\nshape. There's nothing to false-positive on, because you're not checking after the\n\nfact, you're removing the way to get it wrong. When a failure is common but a\n\nprecise gate would be noisy, this beats the gate.\n\nBelow the blocking gate sits the **tripwire** — Shingo's warning type: warn without\n\nblocking, or ask for confirmation before an irreversible step. This is the right\n\nreach for failures that are catastrophic on first occurrence but genuinely hard to\n\ngate cleanly — the data deletion, the leak, the fabrication. You don't need a perfect detector. A loud \"are\n\nyou sure, here's what you're about to overwrite\" buys most of the protection at a\n\nfraction of the false-positive cost. The mistake is letting \"we can't build a clean\n\ngate for this\" collapse into \"so we stay fully reactive\" on a failure you can't\n\ntake twice.\n\nThe rung that doesn't work is the one that looks like discipline: a script everyone\n\nis *supposed* to run but nothing enforces. It's neither a gate nor a default path,\n\nso under any real deadline it just gets skipped. A rule that lives only in prose is\n\na preference, however firmly phrased.\n\nSo the ladder, top to bottom: blocking gate, default path, tripwire, stay reactive.\n\n\"Should I gate this\" was always the wrong question. The question is which rung the\n\nfailure earns.\n\nBecause a gate is a standing liability, \"earned\" is a real filter, and a filter\n\nthat only ever says yes isn't filtering. The same cost test that licenses a\n\nproactive gate also tells you when to leave something ungated, and I find I reach\n\nfor it in that direction about as often.\n\nA concrete one: I recently changed a documentation convention across the project. A\n\npure convention change, no behavior, no domain term retired. The reflex in a repo\n\nthat leans this hard on enforcement is to add a check for the new convention. What\n\nI wrote instead was an instruction to *not*:\n\nDon't add a gate or script — this is a convention change only; the enforcement\n\nhook stays deferred per the cost test.\n\nThe convention isn't load-bearing, a check on it would generate churn every time\n\nthe convention itself evolves, and the failure if someone gets it wrong is\n\ncosmetic. No gate. Not yet, maybe not ever.\n\nThe same logic runs in reverse, against gates that already exist. A check family\n\nonly grows: one iteration of mine added six at once, and nothing in the workflow\n\never pushes the other way. The tidy instinct is a recurring \"prune the gates\"\n\nritual at every close. That instinct is right about the pressure and wrong about\n\nthe mechanism, for two reasons that separate a gate from a stale paragraph of\n\nprose. A doc paragraph taxes every session, so doc-bloat is pure, constant rent; a\n\nredundant gate taxes only when it runs, so its bloat is mostly latent. And a stale\n\ndoc is just noise, while a gate is a load-bearing invariant whose removal risks\n\nsilently dropping a correctness check no one notices is gone. So pruning earns more\n\ncare and less frequency than tidying prose, not the same recurring slot. A full\n\nportfolio review every close would usually find nothing, and a ritual that usually\n\nfinds nothing is the skipped-discipline trap again, dressed as housekeeping.\n\nTwo corollaries fall out. \"Merge these two similar gates\" defaults to *no*: two\n\nprecise checks beat one fuzzy merged one, and a merge is a design act with its own\n\nfailure mode, not janitorial cleanup. And the counter-pressure that *is* warranted\n\nshould itself be mechanized rather than left to willpower — a redundancy meta-check\n\nthat reads the coupling each gate already declares in its header and flags real\n\noverlap, fired on a threshold or a phase boundary, never once per commit. Even\n\npruning gates wants a rung, not a resolution to be tidier.\n\nA good gate even pays a rent rebate. Every rule you load into the agent's\n\nalways-present instructions is standing context it has to carry, and that context\n\nisn't free. A deterministic check lets you *delete* the paragraph of prose that\n\nused to beg the model to remember a rule and replace it with a script plus a\n\nreference read only when it fires. The gate does the remembering, so the context\n\nwindow doesn't have to — fewer gates can mean a heavier prompt, not a lighter one.\n\nThe recursion is the part I like best: the system diagnosed the bias on itself.\n\nEverything in this project leans toward enforcement over discipline. The agent\n\nnamed the hazard better than I had:\n\nthe whole\n\n`check-*`\n\nscript family leans hard toward enforce-over-discipline, so\n\na fresh session inherits a \"when in doubt, add a check\" texture with no stated\n\nboundary. That absence is a latent gap.\n\nA bias toward gating with nothing written down about when *not* to. The\n\nhonest response, by this essay's own logic, is not to immediately write the\n\nboundary rule into the always-loaded instructions. That would be exactly the\n\nproactive over-gating the rule warns against. The boundary rule hasn't failed yet.\n\nSo we left it ungated, with a concrete trigger: the first time any session ships a\n\ngate that turns out high-false-positive or merely approximate, *that's* the\n\nobserved failure, and the boundary rule gets promoted then — and into a\n\nread-on-demand reference, not the always-loaded file, because it's judgment-grade\n\nmaterial rather than a per-session directive. The meta-rule about earning gates is\n\nitself being made to earn its place. The failure log is the design document, all\n\nthe way up.\n\nThis worked for one person, on one roughly 150,000-line codebase, where I own the\n\nglossary and the domain model is a single coherent thing in a single head. \"Earned\n\nfrom failure\" is cheap when the failure is yours and you hit it the same day you\n\nbuild the gate. At team scale the picture is harder: the standing context cost\n\ncompounds, the false-positive triage falls on people who didn't feel the original\n\npain, and \"let it go wrong cheaply once\" is a different proposition when the once is\n\nsomeone else's afternoon. I don't have evidence for the team case. I have evidence\n\nfor the solo case and a suspicion the principles survive the move, which is not the\n\nsame thing.\n\nOne tempting claim I'll resist. I had a hunch, watching the sessions, that the agent\n\n*resists* proposing gates proactively — that it fixes the immediate issue and only\n\ndesigns the enforcement when pushed. There's a clean-looking receipt for it. Asked\n\nto prevent a recurrence after I'd approved a fix, the agent replied:\n\nApproved. Let me make the fix first, then design the enforcement check.\n\nFix first, gate second, only once prompted. But when I went looking for the\n\npattern, the support was mixed. In the flow of a task it does tend to fix first and\n\ngate when asked. Hand it the same question as a matter of policy, though, and it\n\nargues for *more* proaction than I'd proposed, not less. So I won't dress that up as\n\na clean behavioral finding. It's a real tension I don't yet understand well enough\n\nto assert.\n\nNone of this is specific to my domain. Any agent operating on a governed system —\n\na customer-service platform, a billing pipeline, anything where words have to mean\n\nthe same thing across many surfaces — accumulates the same class of drift, and the\n\nsame four rungs apply.\n\nPicture the Contact Center case concretely, since it's the domain I know best. A\n\nsingle concept gets named in a routing rule, a reporting dashboard, an agent-facing\n\nscript, and the schema of the system that ties them together. Rename it in three of those\n\nfour and the fourth keeps quietly emitting the old word, so the dashboard and the\n\nrouter disagree about what they're counting, and nothing crashes. That's the drift,\n\nand it's invisible to every test that checks behavior rather than vocabulary. The\n\ngate that couples the surfaces is the only thing that catches a disagreement no\n\nsingle component is wrong about. That coupling pattern, applied in both directions,\n\nis the subject of a companion essay, [ Couple Both Ways](https://vasyltretiakov.dev/p/couple-both-ways).\n\nThe transferable move is to stop treating your instruction file as the place where\n\ncorrectness lives. Prose is where preferences live. Correctness lives in the gates,\n\ndefault paths, and tripwires you build, each one shaped to a failure specific\n\nenough to point at. You don't govern an agent by\n\npredicting how it'll go wrong. You let it go wrong cheaply where you can, and you\n\nconvert each real miss into the cheapest rung that makes it impossible to miss that\n\nway again.\n\n*This essay was written by directing a coding agent over the project it describes;\nI direct and judge, the agent drafts and argues back. The argument-back, in this\ncase, is most of beats four and five.*\n\n*I build governed agent systems at the intersection of Contact Center software and\nAI. If that's a problem you're chewing on, I'm reachable on LinkedIn.*\n\n*Published at vasyltretiakov.dev.*", "url": "https://wpnews.pro/news/gates-earned-from-failure-a-cost-test-for-agent-guardrails", "canonical_source": "https://dev.to/vasyltretiakov/gates-earned-from-failure-a-cost-test-for-agent-guardrails-d74", "published_at": "2026-06-14 16:00:43+00:00", "updated_at": "2026-06-14 16:10:48.319153+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "ai-safety", "mlops"], "entities": ["Domain-Driven Design", "Eric Evans", "Vasyl Tretiakov"], "alternates": {"html": "https://wpnews.pro/news/gates-earned-from-failure-a-cost-test-for-agent-guardrails", "markdown": "https://wpnews.pro/news/gates-earned-from-failure-a-cost-test-for-agent-guardrails.md", "text": "https://wpnews.pro/news/gates-earned-from-failure-a-cost-test-for-agent-guardrails.txt", "jsonld": "https://wpnews.pro/news/gates-earned-from-failure-a-cost-test-for-agent-guardrails.jsonld"}}