{"slug": "your-spec-files-are-lying-to-you-mine-were-too", "title": "Your Spec Files Are Lying to You. Mine Were Too.", "summary": "A developer describes how growing system complexity forces spec file boundary decisions, using a new notification service that the order service calls fire-and-forget. The developer deliberately added notification scenarios to an existing spec file, which passed all tests but created structural problems like mixed ownership. The audit found seven spec debt items in files running since Issue #2, all passing but carrying risk.", "body_md": "I want to be upfront about something before we get into it. None of the frameworks in this article is mine. The ideas here come from two people who have been thinking about this stuff way harder and longer than I have — and they deserve full credit before I say another word.\n\nDan Shapiro — CEO of Glowforge, Wharton Research Fellow, and the person who gave this whole conversation a vocabulary. His blog post “The Five Levels: from Spicy Autocomplete to the Dark Factory” is the conceptual spine of everything I’m about to say. Read the original. It’s short, sharp, and will make you uncomfortable in the best way. [danshapiro.com](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory)\n\nNate B. Jones — AI strategist, zero-hype practitioner, and the person whose YouTube channel made me realize I had been fooling myself about where I actually sat on this ladder. His video “The 5 Levels of AI Coding (Why Most of You Won’t Make It Past Level 2)” is what triggered this entire newsletter. [natebjones.com](https://www.natebjones.com/) — [Watch the video](https://youtu.be/bDcgHzCBgmQ)\n\nThis newsletter — The Level 5 Engineer — is my public learning log. I’m a Senior Software Engineer and a Tech Lead, currently somewhere between Level 2 and Level 3 (in context of the title of this newsletter) on a good day. The goal is Level 5. I’m documenting the climb in real time — the frameworks, the tools, the mindset shifts, and the moments where I realize I’ve been doing it wrong. If you’re on a similar journey, pull up a chair.\n\nEvery issue so far has worked with one service and one spec file. Issue #7 changes that. A second service enters the picture — a notification service that the order service calls after a confirmed payment — and with it comes the question that every growing system eventually forces: where do spec file boundaries go?\n\nThe answer turns out to matter more than it looks. And the audit at the end of this issue found seven spec debt items in files we've been running since Issue #2. All passing. All carrying risk.\n\nThe new service is minimal: `POST /notifications/order-confirmed`\n\naccepts an order id, user id, and total, and returns a notification id and a `QUEUED`\n\nstatus. Simple enough. The interesting part is how the order service calls it.\n\nThe call is fire-and-forget.\n\nWhen an order is confirmed, the order service starts a daemon thread, fires the notification request, and returns the `CONFIRMED`\n\nresponse immediately — without waiting for the notification to succeed. If the notification service is down, slow, or returning errors, the order is still confirmed. The customer gets their confirmation. The notification may or may not arrive.\n\nThis is a deliberate design decision. The order service owns the transaction. The notification service owns delivery. Coupling the order confirmation response to notification delivery would mean a flaky notification service could block order creation — which is a much worse failure mode than a missed notification.\n\nBut the decision has a direct spec implication: any scenario that asserts `Then the order status is \"CONFIRMED\"`\n\nmust remain true regardless of what the notification service does. The spec cannot simultaneously require `CONFIRMED`\n\nand make `CONFIRMED`\n\ndepend on notification success. That would be a hidden coupling — the spec would look independent but the implementation would not be.\n\nThis is the kind of architectural decision that should be in the spec before it's in the code. Once it's in the code it becomes folklore.\n\nBefore doing it right I did it wrong deliberately. I added two notification scenarios to the bottom of `order_creation.feature`\n\n— the existing file that's been covering order creation since Issue #2.\n\nAll 7 tests passed. Green across the board. `pytest`\n\nhas no opinion about spec architecture.\n\nThe problems are structural, not functional:\n\n**Mixed ownership.** `order_creation.feature`\n\nline 1 says `Feature: Order Creation`\n\n. By line 48 it's testing notification delivery. If the notification team changes their contract — say, adding a `channel`\n\nfield to the request — they have to open `order_creation.feature`\n\nto update it. That file is not theirs. The filename, the feature declaration, and the existing scenarios all signal \"this belongs to the order team.\" The notification scenarios are squatters.\n\n**The growing file problem.** At 5 scenarios the file is readable. At 7 it starts to smell. Extrapolate to a real system: 10 downstream services, 5–10 scenarios each, all appended to the originating feature file because each was \"triggered by\" an order creation event. The file becomes a catch-all that nobody owns and everybody edits. Ownership dissolves into \"whoever last touched it.\"\n\n**The agent routing problem.** When an agent is handed `order_creation.feature`\n\nto build against, it must now implement both order logic and notification logic. It cannot know from the file whether the notification call belongs in `POST /orders`\n\nor in a separate endpoint. It will make a decision — probably the wrong one — and that decision will be baked into the implementation before anyone notices.\n\n**Spec debt seed.** The scenario \"Order confirmation succeeds even if notification fails\" uses the step `\"the notification service is unavailable\"`\n\nwithout defining what unavailable means. TCP connection refused? 503? A 30-second hang? Each is a different failure mode with different implications for retry logic. An agent will pick one interpretation silently. Two agents will pick different ones. Both implementations will pass the spec. This is spec debt: it forms quietly, passes its tests, and surfaces as a production incident months later.\n\nAfter documenting what was wrong, I moved the notification scenarios into their own file: `tests/features/notification_service.feature`\n\n. Rewrote both scenarios to:\n\n`503 Service Unavailable`\n\n— not a timeout, not a connection refused, not an ambiguous network failure`order_creation.feature`\n\nto understand itThe result:\n\n`order_creation.feature`\n\n: 5 scenarios, all about order creation. No references to notifications.`notification_service.feature`\n\n: 2 scenarios, all about notification delivery behaviour.The file boundary is now a contract boundary. They can be versioned, owned, and handed to different agents independently.\n\nBounded spec files are not a tidiness preference. They are a precision tool for multi-agent systems. When a spec file is bounded to one service, an agent can be assigned exactly that file and nothing else. It builds one surface, tests one contract, returns. When the spec bleeds across services, the agent must make decisions about service ownership that were never written down. Those decisions accumulate as hidden assumptions in the implementation.\n\nWith the bounded file structure in place, I audited all four feature files in the project for spec debt — places where the spec passes its tests but leaves decisions that should have been made explicitly.\n\nSeven items. All passing. All carrying risk.\n\n**1. Ambiguous timeout measurement**\n\n*File: order_creation.feature — Scenario: payment gateway times out*\n\n`And the response is returned within 12 seconds`\n\nFrom when? The client sends the request? The server receives it? The last retry fires? Two agents will instrument this differently and both will pass. \"Within 12 seconds of the order being submitted\" — defining \"submitted\" as the moment the HTTP request body is sent — removes the ambiguity.\n\n**2. \"Retried\" vs \"total attempts\"**\n\n*File: order_creation.feature — Scenario: payment gateway times out*\n\n`And the payment gateway is not retried more than 2 times`\n\nDoes this mean 2 total attempts (1 original + 1 retry) or 2 retries on top of the original (3 total)? The English is genuinely ambiguous. An agent will pick one. The test will pass. The production system will behave differently than intended.\n\nFix: `And the payment gateway receives no more than 2 charge requests total`\n\n— \"requests total\" removes all ambiguity about whether the first attempt counts.\n\n**3. \"Released\" is not a mechanism**\n\n*File: order_creation.feature — Scenario: payment declined*\n\n`And the inventory reservation is released`\n\n\"Released\" is not defined. Does the inventory service receive a DELETE? A POST to a release endpoint? Does a TTL fire? An agent will implement whichever mechanism seems natural. Two agents will produce incompatible implementations that both pass the spec.\n\nFix: Name the items and the mechanism: `And the inventory service receives a reservation release request for SHOE-RED-42 and BELT-BRN-M`\n\n.\n\n**4. \"Explicit user action\" describes a flow that doesn't exist**\n\n*File: order_creation.feature — Scenario: partial availability*\n\n`And no order is confirmed without explicit user action`\n\n\"Explicit user action\" is not defined anywhere in the spec. A second API call? A UI confirmation? A webhook? This step passes trivially because no order is confirmed — the negative condition is true by absence. But it implies a follow-up confirmation flow that was never built, never specced, and never reviewed. If a future agent reads this step and builds a confirmation flow to satisfy it, it will invent something that was never intended.\n\nFix: Remove it if the follow-up flow is out of scope. Or replace it with a concrete step: `And a subsequent POST to /orders/{order_id}/confirm is required to complete the order`\n\n.\n\n**5. Presence without value**\n\n*File: order_status_bad.feature*\n\nAsserting that a field exists only catches absence — not incorrect presence. An agent can return `{\"status\": null}`\n\nand pass. The spec catches the wrong thing.\n\nFix: Assert the full expected shape with explicit values rather than just field names.\n\n**6. \"An order exists\" doesn't say how**\n\n*File: order_status_good.feature*\n\n`Given an order exists with status \"CONFIRMED\"`\n\n\"An order exists\" doesn't specify how it got there — full creation flow, or directly seeded into the store. The two methods produce different side effects. An agent building a test harness may seed the order directly, bypassing the creation flow entirely, which means the status endpoint tests never verify that a real confirmed order is actually readable via the API.\n\nFix: `Given a previously confirmed order created via POST /orders with id \"{order_id}\"`\n\n— or explicitly state that direct seeding is acceptable.\n\n**7. \"Correct\" is relative**\n\n*File: notification_service.feature*\n\n`And the notification contains the correct order id and total`\n\n\"Correct\" compared to what? If the order total is computed, two agents may compute it differently and both pass \"correct\" against their own computation.\n\nFix: Hardcode the expected value: `And the notification request body contains order_id matching the confirmed order and total of 134.97`\n\n.\n\nEvery item in that audit passes its test. That is the point.\n\nSpec debt is not visible in a green CI run. It is visible only when you ask: *what would a second agent build from this spec?* The step \"the payment gateway is not retried more than 2 times\" has been in the codebase since Issue #2. It has passed every run. But it encodes an ambiguity that will be resolved differently by every agent that implements it fresh. The \"no order is confirmed without explicit user action\" step describes a flow that does not exist anywhere in the codebase. It passes because the negative condition is trivially true.\n\nIf a future agent reads that step and builds a confirmation flow to satisfy it, it will build something that was never specced, never reviewed, and never integrated. The spec invited it. The tests blessed it. Nobody noticed.\n\nThis is the exact failure mode that makes AI-assisted development unreliable at scale. Specs that look precise, pass their tests, and silently invite incompatible implementations. The debt doesn't announce itself. It compounds.\n\nFifteen tests passing across four bounded feature files. The notification service is integrated. The Pact contracts — which existed before this session — remain unbroken because the notification call happens after the transaction completes. Adding a new service boundary didn't require touching existing contracts.\n\nSeven spec debt items documented. None fixed yet. The fixes are the next issue.\n\n*Next issue: The Spec Audit — applying the debt framework to a real existing service and building the diagnostic tool readers can use on their own codebases.*\n\n**Sources & Further Reading**\n\n*This article was written with the assistance of AI tools.*", "url": "https://wpnews.pro/news/your-spec-files-are-lying-to-you-mine-were-too", "canonical_source": "https://dev.to/diyaburman/your-spec-files-are-lying-to-you-mine-were-too-1nie", "published_at": "2026-06-15 17:30:55+00:00", "updated_at": "2026-06-15 18:07:02.545323+00:00", "lang": "en", "topics": ["developer-tools", "ai-agents"], "entities": ["Dan Shapiro", "Glowforge", "Nate B. Jones", "The Level 5 Engineer"], "alternates": {"html": "https://wpnews.pro/news/your-spec-files-are-lying-to-you-mine-were-too", "markdown": "https://wpnews.pro/news/your-spec-files-are-lying-to-you-mine-were-too.md", "text": "https://wpnews.pro/news/your-spec-files-are-lying-to-you-mine-were-too.txt", "jsonld": "https://wpnews.pro/news/your-spec-files-are-lying-to-you-mine-were-too.jsonld"}}