The spec as source of truth, not a side document.
Spec-Driven Development is one of those ideas that software engineers have reached for before and then set aside when the effort stopped paying.
What changed in 2025 is that AI coding agents arrived and made the absence of explicit intent expensive. Prompts are ephemeral. Agent sessions reset. Code changes but the reasoning behind it disappears. The spec is the artifact that stops that from happening.
The Spec Is Becoming the Source of Truth #
For most of software development history, the spec was either a temporary planning artifact or an afterthought. Requirements lived in tickets, design decisions lived in chat threads, and the code was the ground truth. Documentation described what existed after the fact. Spec-Driven Development inverts that relationship. The specification becomes the primary artifact. Code is what gets generated or verified against the spec, not the other way around.
This is not a new idea. Formal methods, design-by-contract, and BDD all contain versions of it. What is new is the practical motivation: AI coding agents need explicit, durable context to produce correct and consistent output. Prompts are too ephemeral. The spec is the only artifact that can carry intent across agent sessions, across team members, and across time.
What Spec-Driven Development Actually Means #
Spec-Driven Development, usually shortened to SDD, is a workflow where a versioned specification guides or generates implementation. The specification is written and reviewed before the agent writes code. It captures:
What to buildβ user problem, goals, and non-goals** What correct behavior looks like**β acceptance criteria, edge cases, error states** How to build it**β architecture decisions, data model, API contracts, security constraints** How to verify it**β test strategy, validation rules, traceability back to requirements
The spec is not a one-time document. It is updated when reality differs from the design. When the agent discovers something during implementation that the spec got wrong, the spec is corrected before continuing. The spec stays honest because it is treated like code.
Recent academic work formalizes this framing: researchers describe SDD as treating specifications as the source of truth and code as generated or verified against them. The practical interpretation is that the spec is the reviewed, durable record of intent that any human or AI tool can read and trust.
Three terms capture different points on the spec-use spectrum:
Spec-first means writing the full specification before any implementation begins. This is the strictest interpretation and the one closest to waterfall if not done carefully.
Spec-anchored means keeping a specification in sync with implementation throughout the feature lifecycle. The spec is updated as decisions change. This is the most practical version for most teams.
Spec-as-source means generating or validating implementation from the spec, either through AI agents or through tooling that checks code against spec constraints. This is the direction tools like GitHub Spec Kit and Kiro are moving toward.
Why SDD Matters Now #
The honest answer is that SDD is not compelling for a solo developer building a one-day script. The overhead is not worth it.
SDD becomes valuable when three conditions are present: the feature is large enough to span multiple sessions, the agent needs to make decisions that affect the architecture, and the work will be reviewed or continued by someone else.
All three conditions are increasingly common with AI-assisted development.
LLMs need context, not just prompts. A model that receives a vague prompt makes vague decisions. A model that receives a reviewed specification with explicit constraints, non-goals, and acceptance criteria makes better decisions and is easier to course-correct when it drifts. This connects to how retrieval and representation work: giving an agent a versioned spec is a form of structured retrieval of project intent.
Code generation is cheap; deciding what to build is still hard. The bottleneck in AI-assisted development is no longer typing β it is knowing what to build and how to constrain the agent. SDD shifts the effort to where it matters: specifying intent clearly before generation begins.
Prompts are ephemeral. The agent does not remember what you told it in the last session. A versioned specification stored in the repository does. Every new session can read the same spec and implement against the same intent without re-establishing context from scratch.
** Vibe coding works until it does not.** For prototypes and throwaway work, ad-hoc prompting is faster and appropriate. As soon as a feature requires security constraints, multi-file architecture decisions, or a team handoff, the absence of a spec becomes the main source of drift and defects. See the comparison in
SDD vs Vibe Codingfor when each approach applies.
Core Artifacts #
A practical SDD workflow produces four main artifacts. Each one reduces a specific kind of ambiguity before it reaches the agent.
Requirements specification. The problem statement, the users affected, the goals, the explicit non-goals, and the acceptance criteria. Non-goals are as important as goals β they tell the agent what not to build and prevent scope creep. Acceptance criteria are precise enough that each one maps to at least one test.
Design specification. The architectural decisions relevant to this feature: affected modules, data model changes, API contracts, migrations, security constraints, observability requirements, and known failure modes. This is not a full system design document β it is the subset of architecture decisions needed to implement this feature correctly.
Task plan. A sequence of small implementation tasks, each with explicit dependencies, expected file changes, and validation criteria. Tasks are small enough to implement in a single agent session and verify with a focused diff. Each task has a human review checkpoint.
Traceability record. A mapping from acceptance criteria to tests, from design decisions to affected files, and from tasks to commits. This is what separates SDD from documentation that becomes stale: traceability makes it possible to verify that the spec was actually implemented, not just written.
These artifacts do not have to be heavyweight. A simple feature might produce a single two-page markdown document covering all four areas. The format matters less than the habit of writing intent down before implementation begins.
How SDD Differs from Documentation #
The most common confusion is treating SDD artifacts as documentation. They are not documentation in the conventional sense.
Documentation describes. It tells you what the system does, how to use it, and what it contains. It is written after the fact and updated when the system changes.
Specs constrain. A spec tells the agent what it is allowed to build and what it is not allowed to do. It is authoritative before implementation begins. It is validated after implementation completes. A spec that describes what was actually built β rather than constraining what should be built β has already failed its purpose.
Executable specs guide generation and validation. The best SDD specs are close enough to machine-readable that an agent can implement against them and a test suite can verify them. Acceptance criteria written as βthe endpoint must reject unauthenticated requests with a 401 responseβ is an executable spec; βthe endpoint is secureβ is documentation.
Decision Records β ADRs, PDRs, and DDRs β are complementary to SDD artifacts but serve a different purpose. Decision records capture why a choice was made and what was rejected. SDD specifications capture what to build and how to verify it. Both belong in the repository. Together they give AI agents the full picture: the current intent and the reasoning behind it.
How SDD Differs from TDD #
Test-Driven Development and Spec-Driven Development are often confused because both produce explicit artifacts before code exists. The difference is the starting point.
TDD starts with tests. You write a failing test that describes the behavior you want, then write the minimum code to make it pass. TDD is a feedback loop at the unit level. It produces good tests but does not answer the question of whether you are building the right thing.
SDD starts with intent. Before tests exist, before the architecture is decided, the spec answers: who has this problem, what does correct behavior look like, what is explicitly out of scope. The spec then informs what tests to write, which is why good SDD and good TDD are complementary rather than competing.
A practical way to think about it: SDD drives TDD. The acceptance criteria in the spec become the test scenarios. The design spec identifies the integration boundaries that need contract tests. The task plan identifies which unit behaviors need test coverage before the agent implements them.
How SDD Differs from BDD #
Behavior-Driven Development uses natural-language scenarios β typically in Gherkin format β to describe expected behavior from the userβs perspective. These scenarios bridge the gap between business intent and technical implementation.
SDD is broader. It includes behavior descriptions (which can use BDD-style language or plain prose) but also covers architecture decisions, data models, security constraints, task planning, and traceability. BDD can be a useful format for writing acceptance criteria inside an SDD requirements spec. The spec is the container; BDD scenarios are one way to write what goes inside it.
The distinction matters in practice: BDD tooling focuses on making scenarios executable. SDD practice focuses on making intent durable β across tools, across sessions, and across team members.
How SDD Differs from Formal Methods #
Formal methods use mathematical notation and automated verification to prove properties of software systems. They are extremely rigorous and extremely expensive for most production development contexts.
SDD does not require formal notation. A markdown file with acceptance criteria and architecture decisions is a specification. It constrains without being mathematically formal. The level of rigor scales with the stakes: a spec for a billing service should be more precise and more carefully reviewed than a spec for a documentation page.
The relationship is a spectrum:
- Informal prose spec (minimum viable SDD)
- Structured markdown with acceptance criteria and non-goals
- Machine-readable spec with schema validation
- Contract tests derived directly from the spec
- Formal spec with automated proof
Most teams operate in the middle of that spectrum. The goal is not mathematical rigor β it is making intent explicit enough that an AI agent can implement against it and a human reviewer can verify the result.
Benefits of Spec-Driven Development #
Less intent drift. The spec is the reference. When the agent drifts β and it will β the reviewer has something to compare the implementation against. Without a spec, drift is invisible until something breaks.
Better AI outputs. Agents given explicit constraints, non-goals, and acceptance criteria produce implementations that are closer to what was intended and easier to correct when they miss. Context quality directly determines output quality.
Easier review. A pull request attached to a spec is easier to review than a pull request that requires the reviewer to reconstruct the intent from the code. The spec is the review checklist.
Team alignment. When multiple people or agents are working on the same feature, the spec is the shared contract. Without it, each contributor optimizes locally and the pieces may not fit.
Better test planning. Acceptance criteria in the spec map directly to test cases. Test coverage becomes a spec coverage question: is every acceptance criterion covered by at least one test?
Durable handoff. When a feature changes hands β between engineers, between agent sessions, between sprints β the spec is the handoff artifact. It captures what was decided, what was out of scope, and what remains to be validated.
Costs of Spec-Driven Development #
Upfront effort. Writing a good spec before writing any code takes time. For small features, this overhead is real and sometimes not worth it.
False confidence. A spec that exists but is not validated against the implementation gives a false sense of correctness. Stale specs are sometimes worse than no spec: they mislead reviewers and agents that read them.
Stale specs. Specs drift when the team treats them as planning artifacts rather than living documents. Updating the spec when implementation differs from design is not optional β it is what separates SDD from documentation that accumulates and rots.
Generated bureaucracy. AI agents can generate exhaustive task lists and verbose specifications quickly. A 200-task spec generated in thirty seconds is not a useful spec β it is a bureaucracy generator. Good SDD requires judgment about what to specify and what to leave implicit.
Tool lock-in. Some SDD tools are opinionated about format, file structure, and workflow. A spec written in a proprietary format is harder to carry across tools than a markdown file with clear headers and acceptance criteria.
Conclusion #
Spec-Driven Development is not a new methodology. It is an old discipline becoming practical again because the cost of implicit intent is now visible in AI-generated code.
The discipline is simple: write down what you intend to build, reviewed and versioned, before the agent builds it. Keep that record honest by updating it when reality differs. Use it as the reference for review, testing, and handoff.
The spec is not magic. A spec that is not validated becomes the most expensive kind of documentation: one that misleads confidently. Good SDD is the practice of keeping specs honest β small enough to maintain, precise enough to constrain, and durable enough to outlast any single agent session.
SDD sits at the intersection of documentation practice, testing architecture, and code design β all covered in the App Architecture in Production cluster alongside decision records, API design, and data access patterns.
Useful Links #
Decision Records for AI-Driven Software Developmentβ ADRs, PDRs, and DDRs that complement SDD specs by capturing why decisions were madeSpec-Driven Development vs Vibe Coding: Waterfall?β when to add specs and when to keep prompting freelyWhat is Vibe Coding β Meaning, Tools, Benefits, and Risksβ the vibe coding cluster pillarApp Architecture in Productionβ the cluster home for architecture, documentation, testing, and integration patternsUnit Testing in Go: Structure and Best Practicesβ turning SDD acceptance criteria into executable testsUnit Testing in Python: Complete Guideβ test-writing practices that map to SDD acceptance criteriaPython Design Patterns for Clean Architectureβ code structure practices that SDD helps preserveRetrieval vs Representation in Knowledge Managementβ how explicit specs relate to AI context and retrievalGitHub Spec Kit documentationβ a portable open-source SDD toolkitMartin Fowler on Spec-Driven Development toolsβ careful analysis of Kiro, Spec Kit, and Tessl