# Your Spec Files Are Lying to You. Mine Were Too.

> Source: <https://dev.to/diyaburman/your-spec-files-are-lying-to-you-mine-were-too-1nie>
> Published: 2026-06-15 17:30:55+00:00

I want to be upfront about something before we get into it. None of the frameworks in this article is mine. The ideas here come from two people who have been thinking about this stuff way harder and longer than I have — and they deserve full credit before I say another word.

Dan Shapiro — CEO of Glowforge, Wharton Research Fellow, and the person who gave this whole conversation a vocabulary. His blog post “The Five Levels: from Spicy Autocomplete to the Dark Factory” is the conceptual spine of everything I’m about to say. Read the original. It’s short, sharp, and will make you uncomfortable in the best way. [danshapiro.com](https://www.danshapiro.com/blog/2026/01/the-five-levels-from-spicy-autocomplete-to-the-software-factory)

Nate B. Jones — AI strategist, zero-hype practitioner, and the person whose YouTube channel made me realize I had been fooling myself about where I actually sat on this ladder. His video “The 5 Levels of AI Coding (Why Most of You Won’t Make It Past Level 2)” is what triggered this entire newsletter. [natebjones.com](https://www.natebjones.com/) — [Watch the video](https://youtu.be/bDcgHzCBgmQ)

This newsletter — The Level 5 Engineer — is my public learning log. I’m a Senior Software Engineer and a Tech Lead, currently somewhere between Level 2 and Level 3 (in context of the title of this newsletter) on a good day. The goal is Level 5. I’m documenting the climb in real time — the frameworks, the tools, the mindset shifts, and the moments where I realize I’ve been doing it wrong. If you’re on a similar journey, pull up a chair.

Every issue so far has worked with one service and one spec file. Issue #7 changes that. A second service enters the picture — a notification service that the order service calls after a confirmed payment — and with it comes the question that every growing system eventually forces: where do spec file boundaries go?

The answer turns out to matter more than it looks. And the audit at the end of this issue found seven spec debt items in files we've been running since Issue #2. All passing. All carrying risk.

The new service is minimal: `POST /notifications/order-confirmed`

accepts an order id, user id, and total, and returns a notification id and a `QUEUED`

status. Simple enough. The interesting part is how the order service calls it.

The call is fire-and-forget.

When an order is confirmed, the order service starts a daemon thread, fires the notification request, and returns the `CONFIRMED`

response immediately — without waiting for the notification to succeed. If the notification service is down, slow, or returning errors, the order is still confirmed. The customer gets their confirmation. The notification may or may not arrive.

This is a deliberate design decision. The order service owns the transaction. The notification service owns delivery. Coupling the order confirmation response to notification delivery would mean a flaky notification service could block order creation — which is a much worse failure mode than a missed notification.

But the decision has a direct spec implication: any scenario that asserts `Then the order status is "CONFIRMED"`

must remain true regardless of what the notification service does. The spec cannot simultaneously require `CONFIRMED`

and make `CONFIRMED`

depend on notification success. That would be a hidden coupling — the spec would look independent but the implementation would not be.

This is the kind of architectural decision that should be in the spec before it's in the code. Once it's in the code it becomes folklore.

Before doing it right I did it wrong deliberately. I added two notification scenarios to the bottom of `order_creation.feature`

— the existing file that's been covering order creation since Issue #2.

All 7 tests passed. Green across the board. `pytest`

has no opinion about spec architecture.

The problems are structural, not functional:

**Mixed ownership.** `order_creation.feature`

line 1 says `Feature: Order Creation`

. By line 48 it's testing notification delivery. If the notification team changes their contract — say, adding a `channel`

field to the request — they have to open `order_creation.feature`

to update it. That file is not theirs. The filename, the feature declaration, and the existing scenarios all signal "this belongs to the order team." The notification scenarios are squatters.

**The growing file problem.** At 5 scenarios the file is readable. At 7 it starts to smell. Extrapolate to a real system: 10 downstream services, 5–10 scenarios each, all appended to the originating feature file because each was "triggered by" an order creation event. The file becomes a catch-all that nobody owns and everybody edits. Ownership dissolves into "whoever last touched it."

**The agent routing problem.** When an agent is handed `order_creation.feature`

to build against, it must now implement both order logic and notification logic. It cannot know from the file whether the notification call belongs in `POST /orders`

or in a separate endpoint. It will make a decision — probably the wrong one — and that decision will be baked into the implementation before anyone notices.

**Spec debt seed.** The scenario "Order confirmation succeeds even if notification fails" uses the step `"the notification service is unavailable"`

without defining what unavailable means. TCP connection refused? 503? A 30-second hang? Each is a different failure mode with different implications for retry logic. An agent will pick one interpretation silently. Two agents will pick different ones. Both implementations will pass the spec. This is spec debt: it forms quietly, passes its tests, and surfaces as a production incident months later.

After documenting what was wrong, I moved the notification scenarios into their own file: `tests/features/notification_service.feature`

. Rewrote both scenarios to:

`503 Service Unavailable`

— not a timeout, not a connection refused, not an ambiguous network failure`order_creation.feature`

to understand itThe result:

`order_creation.feature`

: 5 scenarios, all about order creation. No references to notifications.`notification_service.feature`

: 2 scenarios, all about notification delivery behaviour.The file boundary is now a contract boundary. They can be versioned, owned, and handed to different agents independently.

Bounded spec files are not a tidiness preference. They are a precision tool for multi-agent systems. When a spec file is bounded to one service, an agent can be assigned exactly that file and nothing else. It builds one surface, tests one contract, returns. When the spec bleeds across services, the agent must make decisions about service ownership that were never written down. Those decisions accumulate as hidden assumptions in the implementation.

With the bounded file structure in place, I audited all four feature files in the project for spec debt — places where the spec passes its tests but leaves decisions that should have been made explicitly.

Seven items. All passing. All carrying risk.

**1. Ambiguous timeout measurement**

*File: order_creation.feature — Scenario: payment gateway times out*

`And the response is returned within 12 seconds`

From when? The client sends the request? The server receives it? The last retry fires? Two agents will instrument this differently and both will pass. "Within 12 seconds of the order being submitted" — defining "submitted" as the moment the HTTP request body is sent — removes the ambiguity.

**2. "Retried" vs "total attempts"**

*File: order_creation.feature — Scenario: payment gateway times out*

`And the payment gateway is not retried more than 2 times`

Does this mean 2 total attempts (1 original + 1 retry) or 2 retries on top of the original (3 total)? The English is genuinely ambiguous. An agent will pick one. The test will pass. The production system will behave differently than intended.

Fix: `And the payment gateway receives no more than 2 charge requests total`

— "requests total" removes all ambiguity about whether the first attempt counts.

**3. "Released" is not a mechanism**

*File: order_creation.feature — Scenario: payment declined*

`And the inventory reservation is released`

"Released" is not defined. Does the inventory service receive a DELETE? A POST to a release endpoint? Does a TTL fire? An agent will implement whichever mechanism seems natural. Two agents will produce incompatible implementations that both pass the spec.

Fix: Name the items and the mechanism: `And the inventory service receives a reservation release request for SHOE-RED-42 and BELT-BRN-M`

.

**4. "Explicit user action" describes a flow that doesn't exist**

*File: order_creation.feature — Scenario: partial availability*

`And no order is confirmed without explicit user action`

"Explicit user action" is not defined anywhere in the spec. A second API call? A UI confirmation? A webhook? This step passes trivially because no order is confirmed — the negative condition is true by absence. But it implies a follow-up confirmation flow that was never built, never specced, and never reviewed. If a future agent reads this step and builds a confirmation flow to satisfy it, it will invent something that was never intended.

Fix: Remove it if the follow-up flow is out of scope. Or replace it with a concrete step: `And a subsequent POST to /orders/{order_id}/confirm is required to complete the order`

.

**5. Presence without value**

*File: order_status_bad.feature*

Asserting that a field exists only catches absence — not incorrect presence. An agent can return `{"status": null}`

and pass. The spec catches the wrong thing.

Fix: Assert the full expected shape with explicit values rather than just field names.

**6. "An order exists" doesn't say how**

*File: order_status_good.feature*

`Given an order exists with status "CONFIRMED"`

"An order exists" doesn't specify how it got there — full creation flow, or directly seeded into the store. The two methods produce different side effects. An agent building a test harness may seed the order directly, bypassing the creation flow entirely, which means the status endpoint tests never verify that a real confirmed order is actually readable via the API.

Fix: `Given a previously confirmed order created via POST /orders with id "{order_id}"`

— or explicitly state that direct seeding is acceptable.

**7. "Correct" is relative**

*File: notification_service.feature*

`And the notification contains the correct order id and total`

"Correct" compared to what? If the order total is computed, two agents may compute it differently and both pass "correct" against their own computation.

Fix: Hardcode the expected value: `And the notification request body contains order_id matching the confirmed order and total of 134.97`

.

Every item in that audit passes its test. That is the point.

Spec debt is not visible in a green CI run. It is visible only when you ask: *what would a second agent build from this spec?* The step "the payment gateway is not retried more than 2 times" has been in the codebase since Issue #2. It has passed every run. But it encodes an ambiguity that will be resolved differently by every agent that implements it fresh. The "no order is confirmed without explicit user action" step describes a flow that does not exist anywhere in the codebase. It passes because the negative condition is trivially true.

If a future agent reads that step and builds a confirmation flow to satisfy it, it will build something that was never specced, never reviewed, and never integrated. The spec invited it. The tests blessed it. Nobody noticed.

This is the exact failure mode that makes AI-assisted development unreliable at scale. Specs that look precise, pass their tests, and silently invite incompatible implementations. The debt doesn't announce itself. It compounds.

Fifteen tests passing across four bounded feature files. The notification service is integrated. The Pact contracts — which existed before this session — remain unbroken because the notification call happens after the transaction completes. Adding a new service boundary didn't require touching existing contracts.

Seven spec debt items documented. None fixed yet. The fixes are the next issue.

*Next issue: The Spec Audit — applying the debt framework to a real existing service and building the diagnostic tool readers can use on their own codebases.*

**Sources & Further Reading**

*This article was written with the assistance of AI tools.*