The New Code: Why Specifications Will Replace Programming

A developer built an SDLC harness where AI agents implement features from spec files, but found the bottleneck was underspecified specs. The system treats the spec as source code and generated code as a compiled artifact, with a redesigned pipeline that shares context across tasks to reduce token waste.

The agents were doing exactly what I told them to. That was the problem. I'd built a pipeline where AI agents could take a spec file, implement a feature, run the tests, review the result, and commit — without me writing a line of code. It mostly worked. Dozens of features shipped. But I kept reviewing the output and feeling like something was off. Not broken. Just subtly wrong in a way that was hard to name. I spent a while blaming the models. Then the prompts. Then the validation steps. Eventually I had to sit with the obvious: the agents were implementing exactly what I'd written. My specs were underspecified. The bottleneck was always me, at the planning stage. There's something that feels right about vibe coding. You're operating at the level of intent — describing what you want and letting the model handle the mechanics. That part is genuinely useful. But watch what most people do with the output: Traditional development: Source code → Compiler → Binary keep the source; regenerate binary anytime Vibe coding done wrong: Prompt → LLM → Generated code delete the prompt; commit the code You've shredded the source and carefully version-controlled the binary. The prompt — your structured description of what you wanted, why, and what "correct" meant — is the valuable artifact. The generated code is what compiles from it. When you discard the prompt and commit only the output, you've lost the thing that actually mattered. The practical consequence shows up six months later: you're staring at code you wrote and spending twenty minutes reverse-engineering your own intent. The spec would have been a thirty-second read. I built what I call an SDLC Software Development Lifecycle harness — a system where instead of writing code directly, you write a spec describing what needs to be built, and AI agents handle the implementation, testing, review, and documentation. The spec is the source. The code is what gets compiled from it. Simple idea. The interesting part is figuring out how to run that pipeline efficiently. I made some expensive mistakes along the way. My original design ran every task through the full pipeline — implement, test, review, document, wrap-up — independently, in an isolated environment, in parallel: tasks.md │ ├── Task 1 isolated → implement → test → review → doc → wrap ~200k tokens ├── Task 2 isolated → implement → test → review → doc → wrap ~200k tokens └── Task 3 isolated → implement → test → review → doc → wrap ~200k tokens ──────────────────────────── Total: ~200k × N tasks On paper: thorough. In practice: around 200,000 tokens per task. A five-task spec burned through a million tokens before I'd seen any integrated output. The waste was structural. Setup phases that didn't vary between tasks ran N times. Per-task reviews could only see one task's changes — they missed integration problems anyway. Documentation ran in isolation before anything was assembled. And the per-task review gave me false confidence: it caught issues within a task, but couldn't catch whether the integrated result actually worked. I was paying for isolation on tasks that were mostly sequential. The redesign comes from a simpler question: what actually needs isolation, and what can be shared? The implement step benefits from a fresh context per task — no bleed between unrelated changes. Everything else — setup, testing, review, documentation — can run once over the integrated result. That produces a clean ladder of tools: /patch trivial fix, no tests needed ↓ /sdlc-task one unit → implement → fast-test → fix → commit ↓ /sdlc-run one whole spec, full lifecycle, in-place ↓ /sdlc-flow one whole spec, worktree isolation, produces a PR ↓ /sdlc-block block-level orchestrator, branch train of PRs Pick the rung that matches the scope. A trivial hotfix doesn't need a review stage. A whole feature spec does. Multiple independent feature blocks running in parallel need the orchestrator. The biggest change is in the orchestrator. The original version orchestrated individual tasks. The rebuilt version operates at the block level — each block is a complete feature-sized unit of work that runs its own full pipeline in its own branch: master-plan.md │ sdlc-block orchestrator │ Phase 1 ─────────────────────── parallel ├── Block A → sdlc-flow → PR 1 └── Block B → sdlc-flow → PR 2 │ after Phase 1 merges │ Phase 2 └── Block C → sdlc-flow → PR 3 And inside each sdlc-flow , the lean design runs one fresh implement agent per task, then one consolidated back-half over the integrated result: shared setup once │ Task 1 → fresh implement agent Task 2 → fresh implement agent Task 3 → fresh implement agent │ test → review → fix → document → wrap-up once, over the integrated result Per-task agent isolation, at roughly the cost of running it once. Faster and cheaper, yes. Better results, no. What made the results better was fixing the specs. I'd been optimizing execution while planning was still the bottleneck. The agents were reading underspecified tasks, making reasonable assumptions, and producing technically correct results that missed the actual intent. No amount of pipeline tuning changes that. You can't fix a bad spec with a better pipeline. The most interesting thing I've done with this pipeline: I used it to redesign itself. Here's a condensed version of the master plan I wrote before touching a line of code. The goal was to redesign all four SDLC engines — and the spec-driven pipeline was the tool I used to do it: SDLC Engines Redesign — Master Plan Goal Four engines ship in the harness, but they no longer earn their place as built. The most expensive one burns ~200k tokens per task running a full review+test+document cycle that's mostly redundant when tasks are sequential. Usage data: almost nobody runs it anymore — the simpler single-spec runner has become the actual default. This redesign gives each engine a distinct scope × ceremony tier, gives every run a committed token-accounted state trail, and rationalizes the planning commands that feed them. Architecture | Engine | Scope | When to use | |--------------|------------------|--------------------------------------| | /patch | Trivial hotfix | No tests needed | | /sdlc-task | One small unit | Implement → test → fix → commit | | /sdlc-run | One whole spec | Full lifecycle, in-place | | /sdlc-flow | One whole spec | Worktree isolation + PR | | /sdlc-block | A full roadmap | Block-level orchestration, PRs | --- Phase 0 — Foundation Block A — Unified token telemetry What: Every engine writes a committed state file with a token usage block after each phase. Persists what was previously render-only output that vanished when a run ended. Why: Token costs were invisible between runs. You couldn't tell which stage was expensive without watching the live output scroll by. Acceptance criteria: - sdlc-run writes a committed state file with a tokens roll-up after each phase - sdlc-flow persists per-task token usage into its committed state - No engine references a gitignored breadcrumb file - node --check clean on both engines Block B — Lean single-unit engine What: Rewrite the task-level engine into a lean single-unit runner: implement → fast validation → fix loop ≤3 attempts → commit. Delete the heavy stages and coupling flags. Why: Makes small work cheap enough to be worth a dedicated engine, and gives the trivial /patch command an intermediate rung instead of jumping straight to the full-spec runner. Acceptance criteria: - Engine runs only implement → test → fix → commit - Coupling flags are gone grep-clean - Committed state file with tokens is written - node --check clean --- Phase 1 — The Headline Change Block A — Rewrite the orchestrator as a block-level engine What: Replace the task-level wave machine with block-level orchestration. Reads a master-plan file, computes dependency waves at block granularity, fans out one sdlc-flow per block in its own worktree branch, produces a PR per block by default. Delete the legacy task-level execution engine and its orphaned schema file. Why: The old design's merge-conflict failure mode was structural — tasks sharing a worktree conflicting on shared files. Blocks are independent by construction. This eliminates the failure mode entirely, and reuses the proven single-spec runner as the inner engine rather than duplicating its logic. Files: - Modified: sdlc-block.js — full rewrite keep wave computation, config loader, traced-agent wrapper; add plan-file input, block-level fan-out, branch-train, two-level committed state - Deleted: execution-plan.schema.json — no remaining consumer after the task-level machine is removed Interfaces / shared surface: Reads master-plan-format files the same format /generate-master-plan and /plan produce . Invokes sdlc-flow as the inner engine via the workflow primitive. The committed block-orchestration-state.json — with child-flow token roll-up — is the resume signal and the human review artifact. Out of scope: The /review-PR and /merge-train commands for human-gated branch-train merging Block B . The per-block close-out quality gate Block C . The harness config schema rewrite for the new block. keys Phase 3 . This block may read provisional keys and leave the schema update for Phase 3. Acceptance criteria: - Orchestrator reads a master-plan-format file and fans out sdlc-flow per independent block - Default opens a PR per block; --auto-merge merges in dependency order - Committed block-orchestration-state.json written with child token roll-up - All legacy task-level code is gone grep-clean for removed symbols: runTaskWorktree, runTaskInPlace, --from test, --verify-depth - node --check clean on the engine file A few things worth noting about how this spec works. The goal section captures why this is happening before describing what to build . "Usage data: almost nobody runs it" is load-bearing context — without that, an agent might preserve backward compatibility with a behavior nobody is actually using. The Files field is a commitment, not a description. The agent knows exactly which files to touch and which to leave alone. Anything not listed is out of scope by default. The Interfaces / shared surface field tells the agent what this block exposes to the blocks that come after it. That's the contract other blocks depend on — changing it mid-implementation breaks downstream work. The Out of scope field does something most specs skip entirely: it names the things you are not building. This matters because an agent trying to be helpful will often implement adjacent things that seem obviously related. Explicit out-of-scope entries prevent that drift. And the acceptance criteria are verifiable against the diff. Not "the engine works better" — "these specific symbols are gone, this file is committed, node --check passes." These become the exact checklist the review stage runs. The practical test:Can you hand this spec to a smart engineer with zero prior context, and have them build the right thing? If not, the spec isn't done — and an AI agent will fail the same way. The flow that works for me now: Feature idea or requirement │ /plan or /generate-master-plan mini-roadmap: what, why, blocks, dependencies │ /generate-tasks executable spec: tasks + acceptance criteria + decisions │ pick the right engine rung │ AI agents implement The planning step is where the hard questions happen: what does done actually look like, what are the real constraints, where are the edge cases. These are cheaper to answer before implementation starts than after. When I rush through planning, I pay for it in review cycles. The root cause is almost always an underspecified task — and the fix is always "write a clearer spec," not "use a better model." When I slow down at planning, implementations usually ship clean on the first pass. The scarce skill isn't writing code. It isn't prompting. It's writing specifications that fully capture intent — clear enough that a stateless agent can make the right call without asking follow-up questions. This is hard. It requires thinking through your assumptions before you start. It requires distinguishing between "I have a vague sense of what I want" and "I can articulate what I want precisely enough for someone else to act on." But it transfers. The same discipline that makes a good agent spec makes a better architecture decision record, a clearer PR description, a more useful design doc. It makes you a better collaborator regardless of whether the person on the other side is human or AI. The tooling is changing fast. The underlying skill isn't. Start with the spec. If this was useful, I write about building production AI and agentic systems at learn-agentic-ai.com — including hands-on learning paths available in both English and Brazilian Portuguese. Come build something real.