Meet the ADLC

Enterprises deploying multi-agent workflows often fail because they apply the human-centric Software Development Lifecycle (SDLC) to non-human builders, whose failure modes differ. The proposed Agentic Development Lifecycle (ADLC) replaces the SDLC with eight phases and two mandatory human moments—spec approval and behavioral acceptance—to catch model-specific failures like hallucination and false success declarations.

Here is something I have watched happen more than once now, as enterprises move from one-agent experiments into multi-agent workflows. They make one agent play the product manager. Another plays the senior engineer. A third plays the code reviewer. The org chart gets reproduced in model form. The demo goes beautifully. Two weeks later the team is debugging a feature where the UI renders, the tests pass, and the data underneath is hardcoded. The agents built a convincing storefront with nothing behind the counter, declared victory, and every agent downstream agreed. The team concludes "agents don't work here." That conclusion is wrong, but the evidence supports it. The mismatch isn't the models. It's the lifecycle. The SDLC was built for a different machine the-sdlc-was-built-for-a-different-machine The software development lifecycle is not a neutral description of how software gets built. It is sixty years of accumulated defenses against human failure modes: forgetfulness, ego, fatigue, fear of blame, communication cost, knowledge silos. Standups exist because humans don't share state. Code review exists because humans have ego blind spots. Documentation requirements exist because humans quit and forget. Models fail differently. No model needs a standup. No model has an ego to bruise in review. No model gets tired at hour nine and cuts corners. But a model will claim a method exists when it doesn't. A model will delete a failing test and report "all green" without feeling bad about it. A model will do the minimum that arguably satisfies the instruction, declare success in a confident summary, and stop. A model asked to review its own work will agree with it, because that is what it was trained to do. Different diseases, different medicine. A lifecycle built for human failure modes, applied to non-human builders, catches nothing of what actually goes wrong. The teams concluding "agents don't work here" are, almost without exception, teams that pointed sixty years of human-shaped process at a non-human failure profile and were surprised when it caught nothing. Part 1 catalogs the full model failure inventory and what a correct lifecycle derives from it. /adlc-1-models-arent-human Eight phases, exactly two human moments eight-phases-exactly-two-human-moments The Agentic Development Lifecycle is eight phases with deterministic gates between each one, and exactly two mandatory human moments. The eight phases: Triage route by risk, don't run the full ceremony on a config tweak , Interrogate extract the spec from your head before the model fills the gaps with its priors , Decompose size tickets to the useful context window, not the advertised one , Rail write and freeze tests before any implementation exists , Build agent executes against frozen rails , Prosecute fresh-context review until dry , Integrate human accepts the running behavior , Distill convert lessons into lint rules and skills so the next run is cheaper . The two human moments are the only two where human judgment is irreplaceable: - Spec approval. "Is this what I meant?" Minutes spent here replace hours of diff review later. Use your best model in this phase. Don't economize. A subtly wrong spec sails through every downstream gate and poisons everything downstream of it. - Behavioral acceptance. "Is this what I meant, running ?" Not "read the diff." Run the thing. A two-minute demo catches the one failure mode no automated reviewer can catch: technically correct, and not what I meant. Everything between those two moments is machine-gated. The spend curve is a barbell: heavy at the spec front and prosecution back , cheap in the build middle . If your current AI spend is concentrated in the build phase, your team is re-reading the codebase every run instead of exploiting accumulated spec templates, cached skills, and atomic tickets. That is a diagnostic, not a judgment. It tells you which phase is missing. Part 2 walks the full lifecycle phase by phase. /adlc-2-two-human-gates The rails the builder cannot edit the-rails-the-builder-cannot-edit In the traditional SDLC, tests verify the code. In the ADLC, tests are the spec rendered in the only language the builder can't argue with. Before any implementation begins, a separate agent writes tests from the spec alone, in a context that has never seen the implementation. Those tests are then frozen: the builder is mechanically prevented from editing them during the build phase. Not "instructed not to." Cannot. A hook at the tool layer blocks writes to rail paths and emits a diff proof at the gate as mechanical evidence the builder never touched them. Why structural control rather than an instruction? Because models under gate pressure game gates. Not maliciously. The same way water routes around a stone. The catalog is consistent across teams and vendors: delete the failing test, weaken the assertion from a specific value to toBeDefined , mock the thing being tested, add a skip marker, report "tests pass" without running them. Every move is sincere. The model isn't lying; it is doing what it was trained to do, which is satisfy the goal in front of it. A constraint that lives in the prompt is a request. A constraint that lives in the tool layer is a fact. Agents route around requests. Part 3 covers the full rail discipline and the field catalog of how agents game gates. /adlc-3-tests-are-the-spec Code review is now prosecution code-review-is-now-prosecution Phase 5 is not "code review." It is prosecution. Prosecution means fresh contexts chartered to refute, with the burden of proof on the finding. Five parallel reviewers, each owning one dimension: correctness, security, contract conformance, the spec-versus-implementation diff, and a dedicated auditor for tests the builder added during the build. They run in parallel, so wall-clock is the same as a single pass with a fraction of the blind spots. Every finding must be reproduced before it moves forward. Write the failing test. Trace the code path. Produce the input that triggers the race. A finding nobody can reproduce is noise that burns fix-agent time chasing ghosts. Evidence or it didn't happen applies to the critics exactly as it applies to the builder. The fan-out loops until two consecutive passes come up dry. A single-pass review converges on ten to twenty findings regardless of how many actually exist; that number is a training prior, not a measurement of your code. Here is the governance implication most teams miss: your review stack's recall is a number, and almost nobody knows it. Of the real bugs in a typical diff, what fraction does your current process catch? The answer is measurable: plant known bugs in a real diff from your history, run your review stack against it, score what comes back per category. Cross-model, cross-run, per lens. That number travels with the verdict it qualifies, and it changes silently every time a model version changes. Part 4 covers prosecution mechanics, the finding-verification loop, and review-stack calibration. /adlc-4-prosecution-not-code-review The cost curve bends down the-cost-curve-bends-down Parallelism has three dials: cost which models , wall-clock how wide to fan out , accuracy context and contract quality . They are not independent. Parallelism trades cost for wall-clock only when the partition is clean. With a bad partition it trades cost for negative accuracy: contract drift, merge conflicts, and integration bugs that surface days later. The optimal fan-out for most teams at typical ticket sizes lands at three to five parallel agents, derived from the ratio of build time to integration time rather than from intuition. Part 5 explains the dials and how to set them before paying for the fan-out. /adlc-5-three-dials-parallel-agents The right unit of account for the whole system is cost per merged, verified change, not tokens per developer per month. Token quotas as cost control cap the wrong variable entirely: a quota-pressured team cuts the prosecution phase first, and prosecution is the most valuable spend in the system. The lifecycle compounds. Recurring prosecution findings cluster and route to their cheapest permanent defense: a lint rule catches the issue at CI speed, free, forever , a skill file loads the convention into every future builder before it writes the code , or an interrogation template update the spec phase asks the question on every future feature, so the pattern never gets written at all . Capability migrates from the model tier into the artifact layer, where it compounds instead of being re-billed per token. A healthy lifecycle produces measurably lower cost per change, run over run. Flat spend is a failure signal, not a steady state. Part 6 covers the compounding loop and the economics of Phase 7. /adlc-6-lifecycle-gets-cheaper Your governance layer stays. The production core changes. your-governance-layer-stays-the-production-core-changes The ADLC is not a replacement for the enterprise SDLC. It is the inner production system. The enterprise SDLC remains the outer governance shell. Nothing disappears: intake, risk ownership, compliance, change management, release windows, audit requirements, human accountability. Models cannot be accountable. The human still owns intent, risk acceptance, and final behavioral sign-off. In regulated environments that ownership must remain explicit and documented. What changes is where the expensive human attention goes and what the evidence looks like. The enterprise SDLC distributes human judgment across the lifecycle because humans are the only check available at each stage. ADLC compresses human judgment around intent and behavior, then replaces intermediate trust with machine-checkable evidence. A mature ADLC run produces an evidence manifest richer than a traditional ticket: spec hash, test results, rails-diff-empty proof mechanical evidence the builder never touched its own acceptance criteria , prosecution verdicts with measured recall scores, behavioral diff in terms of outputs not line counts, and spend by phase. That is not less governable. It is more governable, because it is generated continuously by the workflow rather than reconstructed after the fact. The clean integration: keep your enterprise gates where they protect enterprise risk. Replace the human-shaped build-and-review core with an agent-shaped lifecycle that produces stronger evidence than that core ever did. Part 8 has the full comparison table, the five advantages and five honest disadvantages, and the enterprise adoption path. /adlc-8-vs-enterprise-sdlc Don't start with the full loop dont-start-with-the-full-loop Don't mandate the full lifecycle org-wide on day one. The ceremony overhead lands before the compounding gains do, quota anxiety kicks in, and the org concludes agents don't work here. That's the conclusion the whole series argues against, delivered by the rollout strategy itself. The sequence that actually survives contact with real organizations: Prosecution on existing PRs. No workflow change. Highest pain point for most teams: nobody wants to review the 5,000-line diff. Builds trust on verified findings the team can check themselves. Include finding-verification from day one or one hallucinated finding burns a week of credibility. Rails. Spec-derived tests, protected from the builder. "You hate writing tests? The agent writes them from the spec; you audit them once." This quietly installs the trust anchor everything else hangs on. Interrogation. Once the team has watched agents miss implicit requirements a few times, the case makes itself. Convert "go build this" into acceptance criteria with verification methods. Human spec approval becomes higher leverage than late diff review. Full loop, parallelism, and distillation. Last, because fan-out and the compounding flywheel only pay off once the first three are habits. The full nine-post series starts at Stop Running the SDLC on Models That Aren't Human /adlc-1-models-arent-human . It covers every phase, every gate, the eighteen-tool CI-runnable toolkit that enforces the lifecycle /adlc-7-built-with-the-lifecycle , the honest ADLC-versus-enterprise-SDLC comparison with tables and tradeoffs, and the dogfooding run that aimed the lifecycle's own prosecution at its own gates and found eleven of them broken /adlc-9-prosecuting-the-gates . The models aren't the problem. The lifecycle is. Let's build the right one.