{"slug": "the-adlc-toolkit", "title": "The ADLC Toolkit", "summary": "The ADLC toolkit, an eighteen-tool suite enforcing a deterministic machine-checked development lifecycle, was built by the lifecycle itself and then dogfooded by planting bugs in its own diffs to calibrate its prosecution stack. Each tool gates a specific model flaw or property, and the entire toolkit is designed to be zero-dependency and CI-runnable.", "body_md": "Six walls of text and no code - that doesn't sound like [@voodootikigod](https://twitter.com/voodootikigod). I introduce to you the **ADLC** toolkit that enforces this lifecycle was built *by* the lifecycle (eighteen tools, constructed in parallel by a deterministic workflow script that pipelined each one through build → prosecute → fix, exactly the loop [[Two Human Gates and Everything Between Is Machine-Checked](/adlc-2-two-human-gates) through [Three Dials: Parallel Agents Without Merge Hell](/adlc-5-three-dials-parallel-agents)] describe).\n\nThe shape of that run, briefly, because it's the whole series in miniature. A frozen contract came first: a small shared core (`@adlc/core`\n\n, which manages LLM calls, git plumbing, CLI conventions, and the findings ledger) was built, tested, and *merged before any fan-out*, then appended read-only to every tool's ticket. This applied the \"pinned means merged\" rule from [Three Dials: Parallel Agents Without Merge Hell](/adlc-5-three-dials-parallel-agents#pinned-means-merged) literally. Each tool then got a fresh builder agent with its ticket and rails; each build was prosecuted by fresh-context reviewers; verified findings looped back as fix tickets; and the orchestrator was a workflow script (control flow as code, judgment as spawned models, and no boss agent deciding what happens next). The tools came out the other end zero-dependency, `npx`\n\n-runnable, with deterministic exit codes (0 = pass, 2 = gate fails) so every one can sit in CI.\n\nAnd then the toolkit was aimed at itself: `review-calibration`\n\nplanted bugs in the toolkit's own diffs to measure whether the prosecution stack that built it would catch them, including a one-line truthiness guard in `gate-manifest`\n\n's own hash-chain verifier that made the provenance tool silently skip verification ([Prosecution, Not Code Review](/adlc-4-prosecution-not-code-review#planted-bugs) shows the diff). Dogfooding doesn't get more circular than calibrating the reviewer against the tool that proves the reviewer ran.\n\n## The toolkit, by phase[#](#the-toolkit-by-phase)\n\nEvery tool earns its place the same way every phase did: it traces to a model flaw defended or a model property exploited ([Stop Running the SDLC on Models That Aren't Human](/adlc-1-models-arent-human)). Same DNA throughout: small, fresh contexts by construction, gate-shaped.\n\n**Specify**\n\n| Tool | What it gates |\n|---|---|\n`spec-lint` | Every acceptance criterion must name its verification method. Criteria without one are wishes: exit 2 lists the wishes |\n`premortem` | Fresh frontier context, one charter: \"this project failed three months from now; write the postmortem.\" Inverted sycophancy as a stress test |\n`parallax` | Measured ambiguity: N independent readings of the request, divergence becomes multiple-choice questions, convergence becomes a score the gate can check |\n`coldstart` | Each ticket to a cheap fresh model: \"list everything missing to execute this.\" Non-empty list = underspecified ticket, exit 2 |\n\n**Rail + Build**\n\n| Tool | What it gates |\n|---|---|\n`rails-guard` | Mechanical rail freeze: blocks builder edits to test/contract/CI paths, emits the rails-diff-empty proof, greps for new skip/suppress markers |\n`hollow-test` | Diff-scoped mutation testing: surviving mutants are proof of hollow coverage. The honest replacement for coverage % |\n`preflight` | Environment determinism before fan-out: dry-run every operation class the fleet will use, front-load the permission prompts |\n`merge-forecast` | Partition safety: pairwise conflict scoring (file scope, import radius, co-change history, namespace collisions), float computation, width recommendation |\n`model-router` | Tier by escape cost and DAG float, priors from the manifest ledger. Ladder when float absorbs retries, direct when on the critical path |\n`flail-detector` | The two-strike rule, mechanized: detects loop signatures, kills the session, appends dead-ends to the ticket, regenerates fresh. Second strike escalates to decomposition |\n`consensus-fix` | N-version programming for hard bugs: fan N fresh agents at the failing test, agreement = confidence, divergence = the spec is ambiguous about something load-bearing |\n\n**Prosecute**\n\n| Tool | What it gates |\n|---|---|\n`review-calibration` | Planted-bug recall of the whole review stack, per category. Turns \"I do adversarial review\" into a number; re-run on every model change |\n\n(The prosecution loop itself runs on `adversarial-review`\n\n, which predates this toolkit: fresh-context cross-model review with deterministic exit codes for CI.)\n\n**Integrate**\n\n| Tool | What it gates |\n|---|---|\n`behavior-diff` | Diff in behavior space, not code space: API responses, rendered routes, CLI outputs, before vs. after. The 5,000-line code diff becomes six human-readable behavioral items |\n`gate-manifest` | The evidence chain: every gate appends a signed entry (test hashes, rails-diff proof, prosecution verdicts with the calibration score that qualifies them, models used, and spend per phase). A merge ships with its provenance |\n\n**Distill**\n\n| Tool | What it gates |\n|---|---|\n`lesson-foundry` | The ratchet: clusters recurring verified findings, routes each to its cheapest permanent defense (lint rule, skill candidate, or interrogation question) |\n`skill-rot` | Cache invalidation for knowledge: extracts each skill's verifiable claims, checks them against the repo, stamps `last-verified` , exits 2 with the stale list |\n`model-ratchet` | The free re-audit: on model release, re-prosecute main's hot paths with the newest models; verified findings feed the foundry |\n`rejection-mining` | Mines gatekeepers' recorded \"no\"s from PR history and rejection docs into prosecution lenses and pre-flight checklists |\n\n(`skill-mining`\n\n, also predating the toolkit, handles the skill-extraction half of Distill.)\n\n## The frontier-free doctrine[#](#the-frontier-free-doctrine)\n\nHere's the constraint that shaped the toolkit and turns out to be a doctrine: the lifecycle must hit its accuracy targets with mid-tier models (Opus, Sonnet, Haiku-class, no frontier-of-frontier access). Not as a degraded mode but as the design center. (It's also the common enterprise reality: approved-model lists, quota ceilings, procurement lag.)\n\nThe premise: the gap between a mid model and a frontier model is almost entirely a gap in *single-pass judgment* (depth of insight per forward pass, coherent horizon, and knowing-what-it-doesn't-know). The doctrine: at every point where the lifecycle appears to need single-pass judgment, buy the same outcome with structure instead. Five substitutions:\n\n**The generator-verifier gap is the engine.** Recognizing a correct artifact is easier than producing one;*checking*one deterministically is easier still. Generate wide and cheap, verify deterministically, select with a mid model. The quality of output decouples from the generator and couples to the verifier; this lifecycle's verifiers are tests, types, contracts, and hash chains: model-free.**You never need a model smarter than the gate it must pass.** That's the doctrine in one line.**Search replaces insight.** What a frontier model produces in one pass, a mid model produces as the best of N diverse attempts: judge panels for design, consensus for hard bugs, or loop-until-dry for review breadth. And`review-calibration`\n\nmakes the exchange rate a*number*: if a 3-pass mid-tier prosecution stack shows 0.85 planted-bug recall and a single frontier pass shows 0.6, the stack**is** the more capable reviewer. Measure the stack, never the model.**Decomposition replaces horizon.** Ticket size is tier-indexed: a cheap model that only ever sees a few thousand tokens of well-railed ticket is not operating below the frontier; it's operating below its own degradation point, which is the only line that matters.**Banking replaces presence.** Rent the big model occasionally to mint structure (contracts, skills, templates, lints) then spend mid-tier inside that structure indefinitely ([The Lifecycle That Gets Cheaper Every Run](/adlc-6-lifecycle-gets-cheaper)is this substitution, run as a flywheel). Capability migrates from the model tier into the artifact layer, where it compounds instead of being re-billed per token.**Measurement replaces metacognition.** The capability mid models most lack is knowing what they don't know, so never ask.`parallax`\n\nswaps \"do you have questions?\" for divergence-of-N-readings;`consensus-fix`\n\nswaps \"are you sure?\" for agreement statistics;`coldstart`\n\nswaps \"is this clear?\" for an enumerated gap list. None need a smarter model. They need more samples and a division operation.\n\nAnd the sixth substitution is the one this series opened with: **humans are the frontier tier.** The two human gates sit exactly where frontier judgment would otherwise go (\"is this what I meant?\" and \"is this what I meant, running?\") because the human *is* the ground truth for intent. The tooling (`behavior-diff`\n\n, the manifest, and parallax's multiple-choice questions) exists to compress what the human must absorb so the minutes stay minutes.\n\nThe honest loss account, because doctrines without one are marketing: you give up single-pass architectural elegance (mitigated by judge panels + premortem + the human at the spec gate, and the residue is real); subtle cross-cutting bug intuition (loop-until-dry raises recall asymptotically, `model-ratchet`\n\nschedules the deep read for whenever a stronger model ships); latency (N passes are slower than one brilliant pass, which is recovered by parallelism); and long-horizon refactors that resist decomposition (the genuinely hard residue: serialize them, best available model, densest rails, in-flight validator, and accept that ~5% of work runs at maximum supervision). Net: a capability shortfall converted into a compute-plus-process bill, with gates keeping the conversion honest. And when the constraint lifts, nothing is wasted: every mechanism here amplifies a frontier model exactly the way it amplifies a mid one.\n\n## Adoption: relief first, lifecycle later[#](#adoption-relief-first-lifecycle-later)\n\nThe field wisdom that outranks everything else here: **teams do not adopt platonic lifecycles; they adopt relief from their worst pain point**, then ask what else hurts. Sequencing for a real team:\n\n**Prosecution of existing PRs**([Prosecution, Not Code Review](/adlc-4-prosecution-not-code-review)standalone). Highest pain: nobody wants to review the 5,000-liner. This requires zero workflow change, and trust gets built on verified findings the team can check themselves. Include finding-verification from day one: a single hallucinated finding wastes an hour of human time and burns a week of credibility.**Rails**([Tests Are the Spec in the Only Language the Builder Can't Argue With](/adlc-3-tests-are-the-spec)). \"You hate writing tests? The agent writes them from the spec; you audit them once.\" This quietly installs the trust anchor everything else hangs on.**Interrogation**([Two Human Gates and Everything Between Is Machine-Checked](/adlc-2-two-human-gates#p1)). Once the team has watched agents miss implicit requirements a few times, the case for spec interrogation makes itself.**Full loop with parallelism**([Three Dials: Parallel Agents Without Merge Hell](/adlc-5-three-dials-parallel-agents)) and** distillation**([The Lifecycle That Gets Cheaper Every Run](/adlc-6-lifecycle-gets-cheaper)). Last, because worktree fan-out and the compounding flywheel only pay off once 1-3 are habits.\n\nThe anti-pattern is mandating the full lifecycle org-wide on day one. The ceremony overhead lands before the compounding gains do, quota anxiety kicks in, and the org concludes \"agents don't work here,\" which, as [Stop Running the SDLC on Models That Aren't Human](/adlc-1-models-arent-human) argued, is the conclusion of teams that pointed human-shaped process at a non-human failure profile. Don't hand them a second wrong-shaped process at higher ceremony.\n\n## The through-line[#](#the-through-line)\n\nSeven posts, one move, made over and over: **replace trust with structure, and structure with measurement.**\n\nDon't trust the builder's claim: gate it with a test it cannot edit. Don't trust the reviewer's thoroughness: plant bugs and count. Don't trust the model's questions: fan out readings and diff them. Don't trust the partition: forecast the conflicts before paying for the fan-out. Don't trust the knowledge layer: verify its claims weekly and stamp the date. Don't trust the org's memory: cluster the findings and compile them into lints. And don't trust the lifecycle itself: give it a unit of account (cost per merged, verified change) and check that the curve bends down.\n\nNone of it requires smarter models. All of it gets *better* with smarter models: every mechanism amplifies whatever you run through it, which is what makes it a lifecycle rather than a workaround. The SDLC took sixty years to accrete its defenses against human nature. I get to build the agentic one deliberately, from a flaw inventory, in public, with gates that prove themselves in CI.\n\nThe doctrine is one document and the tools are one repo: [github.com/voodootikigod/adlc](https://github.com/voodootikigod/adlc). The shared core is [ @adlc/core](https://www.npmjs.com/package/@adlc/core); every gate is zero-dependency and\n\n`npx`\n\n-runnable without a global install. Run `npx coldstart`\n\non your next ticket, or `npx review-calibration`\n\nagainst your current review stack; the first calibration number is reliably humbling, and it's the right place to start. Everything here traces to a flaw or an exploit; if you find a phase that doesn't, cut it, and if you find a flaw without a phase, that's the next tool.", "url": "https://wpnews.pro/news/the-adlc-toolkit", "canonical_source": "https://www.voodootikigod.com/adlc-7-built-with-the-lifecycle", "published_at": "2026-06-12 15:00:03+00:00", "updated_at": "2026-06-15 00:46:33.950384+00:00", "lang": "en", "topics": ["ai-tools", "developer-tools", "ai-agents", "ai-infrastructure", "ai-safety"], "entities": ["ADLC", "voodootikigod", "npx", "review-calibration", "gate-manifest", "spec-lint", "premortem", "parallax"], "alternates": {"html": "https://wpnews.pro/news/the-adlc-toolkit", "markdown": "https://wpnews.pro/news/the-adlc-toolkit.md", "text": "https://wpnews.pro/news/the-adlc-toolkit.txt", "jsonld": "https://wpnews.pro/news/the-adlc-toolkit.jsonld"}}