Claude Fable is the architect β it designs every slice, freezes the acceptance gates, and judges the results. GPT-5.5 Codex is the builder and researcher β it does all the engineering and all the web research, in parallel, unattended, for hours. Two Claude Code skills that run this cross-vendor loop on the flat-rate subscriptions you already have β no API keys, no token bills.
git clone https://github.com/DanMcInerney/architect-loop
cd architect-loop && ./install.sh # Windows: .\install.ps1
npm i -g @openai/codex@latest # the builder (Codex CLI >= 0.133)
./install.sh --project
installs to the current repo only instead of globally. You need Claude Code on any paid plan and the Codex CLI signed into a ChatGPT plan.
/architect # the build loop
/architect-research <what you're considering> # the research loop
/architect
runs one work block: judge the last run, spec the next slice,
dispatch builders. /architect-research
is for when you're still deciding what to build β its cited report feeds the build loop's PRD.
One short Fable session per work block β judgment only, it never writes code:
Spec + gates first. Fable specs a one-PR slice, splits it into 1β4 lanes with provably disjoint file sets, and commits the acceptance gates todocs/gates/
beforeany builder starts. Gates are read-only; a builder edit to a gate file fails the slice automatically.Parallel isolated builders. One freshcodex exec
(xhigh) per lane, each in its own git worktree. Builders must argue with the spec before building (silent compliance = defect), build only their declared files, and report raw results β they physically can't commit (the sandbox protects.git
).Fable judges and integrates. It runs the gate commands itself (builder claims are hearsay), reads the diff against the spec's intent (passing tests β mergeable work), then commits and merges passing lanes. Judgment happens in a fresh session β cross-context review measurably beats same-session review.The repo is the only memory.docs/HANDOFF.md
(a short table of contents, pruned every session),docs/gates/
,docs/lanes/
, git history. Not in the repo = didn't happen.Supervision built in. Liveness checks on dispatched runs, stall triage (diagnose the child process tree, kill the narrowest thing), explicit timeouts on every long command.
Scout-first, like the production deep-research systems β no fixed lane taxonomy:
A cheap Codex scout maps the topic(~10 searches): canonical terminology, the load-bearing systems and papers, the named people, the topic's natural fault lines. Skipped for comparisons and fact-finds.Fable designs 3β6 topic-specific lanes from the scout's map, drawing per-source-class tactics from a library (academic citation snowballing, dependents-not-stars repo evidence, emerging-vs-hype gating, production pattern mining, expert tracking) β checked for overlap and gaps before dispatch.Parallel Codex researchers run under hard budgets: search caps, β€5 subjects per lane, saturation stop, strict findings discipline (URL + date- quote + confidence tag; NOT FOUND beats inference; no recommendations). Expert opinion runs as a second wave, roster-seeded by the first.
**Fable verifies and writes.**β₯2 independent sources per load-bearing claim, adversarial falsification searches, citations only from URLs actually fetched β then one author writes one decision-oriented report. Gathering parallelizes; synthesis never does.
Each piece is there because evidence put it there (full citations in DESIGN.md):
- Weak planners hurt more than weak executors β so the strongest model does the design, and builders get exhaustive specs.
- Manager + worktree-isolated workers is the measured-best topology for shared-artifact software work; naive shared-file coordination collapses throughput.
- Frozen external gates beat trusting the agent β but agents game visible tests and their passing PRs are frequently unmergeable, so the architect also reads the diff.
- Memory files rot β so the handoff stays a short map, and detail lives in linked gate/lane files.
- Every production deep-research system uses planner-designed decomposition, none uses fixed lanes β so research lanes are designed per topic, after a scout pass.
| File | What it is |
|---|---|
skills/architect/SKILL.mdskills/architect/dispatch.mdcodex exec
commands, builder block, worktree fan-out, stall triageskills/architect/research.mdskills/architect/HANDOFF.template.mdskills/architect-research/SKILL.mdskills/architect-research/lanes.mdtests/validate_skills.pyDo I need API keys? No. Claude Code runs on your Claude plan; Codex CLI on your ChatGPT plan.
What does a run cost? Builder/researcher runs draw on your ChatGPT plan's 5-hour and weekly quotas; a multi-hour run is a meaningful fraction of a weekly window. Fable's architect sessions are minutes, not hours.
What if a builder wrecks things? Nothing reaches a branch until the architect's tamper, boundary, and gate checks pass β worktrees are discarded and re-dispatched from the freeze commit.
Can I watch a run? Yes β every dispatch prints the builder block, so you
can paste it into an interactive codex
session with /goal
instead.
Why two skills? Research-grade fan-out costs ~15Γ chat-level tokens β it should be a deliberate act, not a side-effect of the build loop.
MIT