Specs replace code? Not until specs stop lying

A developer argues that replacing code with specifications as the source of truth is flawed because specs can be wrong for production realities. They cite an example where a payment system's spec called for exponential backoff, but the engineer implemented flat retries to avoid overwhelming a bank integration. The developer warns that stale specs can mislead AI code generation, removing the safety net of running code to catch mismatches.

At a software engineering retreat last week, someone argued specs should replace code as the source of truth. AI can generate correct code from precise specs, the reasoning went, so code just becomes the output - not the thing you trust. Took me a day to figure out what was bothering me. You verify a spec by reading it. You verify code by running it. One of those is a person nodding. The other is a machine refusing to proceed if something broke. The difference is where the whole thing comes apart. A payment system I worked on had a design spec for the retry layer that said exponential backoff. Standard stuff. Increasing delay, cap at some ceiling. The code retried three times and stopped. Flat. No backoff. No ceiling. Wasn't a mistake. The engineer who wrote it had seen the downstream at peak. The spec author hadn't. The downstream was a bank integration. It degraded under the load pattern synchronized exponential backoff creates - waves of retries piling up, all landing in the same window. Jitter helps, but the real call was about predictable aggregate load against a downstream with fixed rate limits. Three flat retries, evenly spaced, kept the in-flight work predictable. The spec's backoff curve would have thundering-herded the bank into refusing service every time it slowed down. The spec was what someone wanted. The code was what the system needed. The moment they diverged was the moment engineering happened. Spec met production, production won. The spec wasn't wrong in any general way. It was wrong for this downstream, at this scale, under this load. The engineer knew because they'd been there. The spec author hadn't. If the spec had been the source of truth, that change wouldn't have been authorized. The code would have matched the spec. The system would have been worse. The divergence wasn't a bug. It was the most important line in the codebase. Code rots. Some of that rot is loud. It fails. Throws. Doesn't compile. The noise means someone notices. Specs rot in silence. A stale comment sits next to the code it describes - a reviewer might catch the mismatch. A stale spec sits in Confluence. Nobody runs it against the system. Nobody notices it's drifted. Six months later someone references it for a decision and the decision is wrong because the spec hasn't been true since week two. When a spec is the source of truth and you regenerate code from it, a stale spec doesn't mislead a person reading it. It misleads the generation pipeline. The code comes out correct against the spec, wrong against reality. The step that would have caught the gap - running the damn thing - got removed from the loop. All anyone checked was the spec. This happens already, just in the other direction. Every time you describe what you want instead of what the system needs, the AI generates code that matches your description. The description was wrong about what production requires. Only running the code tells you that. Make specs the truth and you've made the gap the main thing that fails - and removed the thing that catches it. Someone reads a spec and thinks "yeah, that sounds right." CI runs the code and says "compiles, tests pass, green." The spec gets checked once - someone's mental model at review time. Code gets checked every time anything changes, against actual behavior. The spec's evidence: one person understood it. The code's evidence: it still works. If specs become the source of truth, the weakest check becomes the authoritative one. The stronger checks - the ones that actually run - become secondary. We built compilers and type checkers and test suites precisely because "sounds right" is about the flimsiest signal in the field. Elevating the spec to the top elevates the flimsiest signal with it. The usual counter is formal specs - machine-verifiable ones. They exist. They're also expensive, and the gap between a formal spec describing desired behavior and generated code correctly implementing it is still a gap. Someone has to verify the generated code actually does what the formal spec says. That verification is... running the code. The spec didn't replace anything. It moved the problem up one level and made it harder. Specs leave out the detail that matters. The timeout that was too short. The index that didn't exist. The null case that wasn't in the spec because the author assumed it couldn't happen. These don't survive in a spec because the spec author never hit them. They survive in code because the code author did. The incident. The slow query. The null pointer at some ungodly hour. Code records what the system needed to survive, not what someone thought it would need before it went live. Every if-statement that exists because a bug happened. Every retry budget tuned by watching logs. Every timeout with a comment that says "this number looks random but it's not." A spec that said "retry on transient failures" wouldn't tell you why the retries are flat instead of exponential, or why the budget is seven and not three, or why the timeout is 800ms and not a second. The spec would be correct - the system does retry. It would also be useless as a source of truth. The truth is in the scars, not the summary. AI amplifies this. Tell a model "implement retry on transient failures" and it'll give you exponential backoff every time. Textbook answer. What the spec would say. Also exactly what would have broken the bank integration. The model doesn't know about the downstream at peak. That knowledge isn't in the training data. It's in the scar. The scar is in the code. Code is maybe the most human artifact we produce. It's a running log of everything we learned the hard way, every assumption that turned out wrong, every edge case that bit us and then got fixed. The goal isn't to make specs so precise that code becomes mechanical. The goal is to make code clear enough that it's its own spec. The spec that runs is the only spec that doesn't lie.