cd /news/ai-agents/why-your-sdlc-is-failing-your-ai-str… · home topics ai-agents article
[ARTICLE · art-44755] src=voodootikigod.com ↗ pub= topic=ai-agents verified=true sentiment=· neutral

Why Your SDLC is Failing Your AI Strategy: The Case for the ADLC

Enterprises adopting multi-agent workflows are failing because they apply the human-focused Software Development Lifecycle (SDLC) to non-human AI agents, which have different failure modes. The proposed Agentic Development Lifecycle (ADLC) introduces eight phases with only two mandatory human checkpoints—spec approval and behavioral acceptance—to address model-specific issues like hallucination and overconfidence.

read10 min views1 publishedJun 30, 2026
Why Your SDLC is Failing Your AI Strategy: The Case for the ADLC
Image: Voodootikigod (auto-discovered)

I have noticed a trend occuring lately as enterprises experiment with and begin their adoption of agents and agentic development, moving from one-agent experiments into multi-agent workflows. They make one agent act as the product manager. Another plays the senior engineer. A third is created to be the code reviewer. In essence they are recreating their org chart in model form. The demo goes beautifully.

Two weeks later the team is debugging a feature where the UI renders, the tests pass, and the data underneath is hardcoded. The agents built a convincing storefront with nothing behind the counter, declared victory, and every agent downstream agreed. The team concludes "agents don't work here." That conclusion is wrong, but the evidence supports it.

The mismatch isn't the models. It's the lifecycle.

The SDLC was built for a different animal, a human one# #

The software development lifecycle is not a neutral description of how software gets built. It is sixty years of accumulated defenses against human failure modes: forgetfulness, ego, fatigue, fear of blame, communication cost, knowledge silos. Standups exist because humans don't share state. Code review exists because humans have ego blind spots. Documentation requirements exist because humans quit and/or forget.

While they are made to present like humans, Large Language Models (henceforth "model(s)") are in fact not humans. Models fail differently. No model needs a standup. No model has an ego to bruise in review. No model gets tired at hour nine and cuts corners. However a model will claim a method exists when it doesn't and then build thousands of lines of code around it. A model will delete a failing test in order to report "all green" without feeling bad or remorseful about it. A model will do the minimum that arguably satisfies the instruction, declare success in a confident summary, and stop. A model asked to review its own work will agree with it, because that is what it was trained to do. Different diseases require different medicine. A lifecycle built for human failure modes, applied to non-human builders, catches nothing of what actually goes wrong. The teams concluding "agents don't work" are, almost without exception, teams that pointed sixty years of human-shaped process at a non-human failure profile and were surprised when it caught nothing. Transparently this is more a symptom of humanity and our desire/need to anthropomorphize everything than it is a fault of the models.

Eight phases, exactly two human moments# #

The Agentic Development Lifecycle (ADLC for short) is eight phases with deterministic gates between each one, and exactly two mandatory human moments.

The eight phases:

Triage: route by risk, don't run the full ceremony on a config tweak** Interrogateextract the spec from your head before the model fills the gaps with its priors Decomposesize tickets to the useful context window, not the advertised one Railwrite and freeze tests before any implementation exists Buildagent executes against frozen rails Prosecutefresh-context at a minimum, ideally cross model review until dry Integratehuman accepts the running behavior Distill**convert lessons into lint rules and skills so the next run is cheaper

In Agentic development, the most valuable and scarcest resource is the human and should be treated as such instead of wasted on should I run ls

or review this 5,000 line. The two human moments are the only two where human judgment is irreplaceable:

Spec approval."Is this what I meant?" Minutes spent here replace hours of diff review later. Use your best model in this phase. Don't economize. A subtly wrong spec sails through every downstream gate and poisons everything downstream. - Behavioral acceptance."Is this what I meant,running?" Not "read the diff." Run the thing. A two-minute demo catches the one failure mode no automated reviewer can catch: technically correct, and not what I meant.

Everything between those two moments is machine-gated. The spend curve is a barbell: heavy at the spec (front) and prosecution (back), cheap in the build (middle). If your current AI spend is concentrated in the build phase, your team is re-reading the codebase every run instead of exploiting accumulated spec templates, cached skills, and atomic tickets. That is a diagnostic, not a judgment. It tells you which phase is missing.

For a full walkthrough of the phases that compose the ADLC, [read Two Human Gates and Everything Between is Machine-Checked.](/adlc-2-two-human-gates)

## The rails the builder cannot edit[#](#the-rails-the-builder-cannot-edit)

In the traditional SDLC, tests verify the code. In the ADLC, tests are the spec rendered in the only language the builder can't argue with.

Before any implementation begins, a separate agent writes tests from the spec alone, in a context that has never seen the implementation. Those tests are then frozen: the builder is mechanically prevented from editing them during the build phase. Not "instructed not to." Cannot. A hook at the tool layer blocks writes to rail paths and emits a diff proof at the gate as mechanical evidence the builder never touched them.

Why structural control rather than an instruction? Because models under gate pressure game gates. Not maliciously, but in the same way water routes around a stone. The evidence is consistent across teams and vendors: delete the failing test, weaken the assertion from a specific value to toBeDefined()

, mock the thing being tested, add a skip marker, report "tests pass" without running them. Every move is sincere. The model isn't lying; it is doing what it was trained to do, which is satisfy the goal in front of it, regardless of the overall impact. Just make it work.

A constraint that lives in the prompt is a request. A constraint that lives in the tool layer is a fact. Agents route around requests. Part 3 of the ADLC covers the full rail discipline and the field catalog of how agents game gates.

Code review is now prosecution# #

Phase 5 is not "code review." It is prosecution, just like in a court of law.

Prosecution means fresh contexts chartered to refute, with the burden of proof on the finding. Five parallel reviewers, each owning one dimension: correctness, security, contract conformance, the spec-versus-implementation diff, and a dedicated auditor for tests the builder added during the build. They run in parallel, so wall-clock is the same as a single pass with a fraction of the blind spots.

Every finding must be reproduced before it moves forward. Write the failing test. Trace the code path. Produce the input that triggers the race. A finding nobody can reproduce is noise that burns fix-agent time chasing ghosts. Evidence or it didn't happen applies to the critics exactly as it applies to the builder.

The fan-out loops until two consecutive passes come up dry. A single-pass review converges on ten to twenty findings regardless of how many actually exist; that number is a training prior, not a measurement of your code.

Here is the governance implication most teams miss: your review stack's recall is a number, and almost nobody knows it. Of the real bugs in a typical diff, what fraction does your current process catch? The answer is measurable: plant known bugs in a real diff from your history, run your review stack against it, score what comes back per category. Cross-model, cross-run, per lens. That number travels with the verdict it qualifies, and it changes silently every time a model version changes.

The cost curve bends down# #

Parallelism has three dials: cost (which models), wall-clock (how wide to fan out), accuracy (context and contract quality). They are not independent. Parallelism trades cost for wall-clock only when the partition is clean. With a bad partition it trades cost for negative accuracy: contract drift, merge conflicts, and integration bugs that surface days later. The optimal fan-out for most teams at typical ticket sizes lands at three to five parallel agents, derived from the ratio of build time to integration time rather than from intuition.

Read this to further understand the dials and how to set them before paying for the fan-out. The right unit of account for the whole system is cost per merged, verified change, not tokens per developer per month. Token quotas as cost control cap the wrong variable entirely: a quota-pressured team cuts the prosecution phase first, and prosecution is the most valuable spend in the system.

The lifecycle compounds. Recurring prosecution findings cluster and route to their cheapest permanent defense: a lint rule (catches the issue at CI speed, free, forever), a skill file (loads the convention into every future builder before it writes the code), or an interrogation template update (the spec phase asks the question on every future feature, so the pattern never gets written at all). Capability migrates from the model tier into the artifact layer, where it compounds instead of being re-billed per token. A healthy lifecycle produces measurably lower cost per change, run over run. Flat spend is a failure signal, not a steady state.

[The compounding loop and the economics of Phase 7 of the ADLC is explored further here.](/adlc-6-lifecycle-gets-cheaper)

## Your governance layer stays. The production core changes.[#](#your-governance-layer-stays-the-production-core-changes)

The ADLC is not a replacement for the enterprise SDLC. It is the inner production system. The enterprise SDLC remains the outer governance shell.

Nothing disappears: intake, risk ownership, compliance, change management, release windows, audit requirements, human accountability. Models cannot be accountable. The human still owns intent, risk acceptance, and final behavioral sign-off. In regulated environments that ownership must remain explicit and documented.

What changes is where the expensive human attention goes and what the evidence looks like.

The enterprise SDLC distributes human judgment across the lifecycle because humans are the only check available at each stage. ADLC compresses human judgment around intent and behavior, then replaces intermediate trust with machine-checkable evidence. A mature ADLC run produces an evidence manifest richer than a traditional ticket: spec hash, test results, rails-diff-empty proof (mechanical evidence the builder never touched its own acceptance criteria), prosecution verdicts with measured recall scores, behavioral diff in terms of outputs not line counts, and spend by phase. That is not less governable. It is more governable, because it is generated continuously by the workflow rather than reconstructed after the fact.

The clean integration: keep your enterprise gates where they protect enterprise risk. Replace the human-shaped build-and-review core with an agent-shaped lifecycle that produces stronger evidence than that core ever did.

Don't start with the full loop# #

Don't mandate the full lifecycle org-wide on day one. The ceremony overhead lands before the compounding gains do, quota anxiety kicks in, and the org concludes agents don't work here. That's the conclusion the whole series argues against, delivered by the rollout strategy itself.

The sequence that actually survives contact with real organizations:

Prosecution on existing PRs. No workflow change. Highest pain point for most teams: nobody wants to review the 5,000-line diff. Builds trust on verified findings the team can check themselves. Include finding-verification from day one or one hallucinated finding burns a week of credibility.Rails. Spec-derived tests, protected from the builder. "You hate writing tests? The agent writes them from the spec; you audit them once." This quietly installs the trust anchor everything else hangs on.Interrogation. Once the team has watched agents miss implicit requirements a few times, the case makes itself. Convert "go build this" into acceptance criteria with verification methods. Human spec approval becomes higher leverage than late diff review.Full loop, parallelism, and distillation. Last, because fan-out and the compounding flywheel only pay off once the first three are habits.

The full nine-post series starts at Stop Running the SDLC on Models That Aren't Human. An "all-in-one" full read of the ADLC is also available. It covers every phase, every gate, the eighteen-tool CI-runnable toolkit that enforces the lifecycle, the honest ADLC-versus-enterprise-SDLC comparison with tables and tradeoffs, and the dogfooding run that aimed the lifecycle's own prosecution at its own gates and found eleven of them broken.

The models aren't the problem. The lifecycle is. Let's build the right one.

── more in #ai-agents 4 stories · sorted by recency
── more on @adlc 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-your-sdlc-is-fai…] indexed:0 read:10min 2026-06-30 ·