Two AI coding agents are better than one — if you make them prove it.
peers drives n ≥ 2 AI coding CLIs (Claude Code, Codex, …) as cooperating peers that don't just agree a task is done — they have to clear hard, measurable gates first: tests pass, coverage holds, no regression, no TODO/stub/skipped-test, secrets clean. One peer implements, the other blind-reviews (without seeing the first's notes), and an adversarial skeptic re-audits before any "done" is accepted. Runs unattended, budget-capped, and container-sandboxed.
Why it beats a single agent on a loop:
Gated, not vibes-based."Looks done" never converges —gates green + skeptic-cleandoes. No convergence theater.Blind peer review catches rubber-stamping— an independent second pair of eyes, by construction.** An adversarial skeptic hunts the edge casesyour tests miss. Unattended & safe:**idle-timeout supervision, USD/tick budget caps, rootless cap-dropped container, egress allow-listing.
In an instrumented diagnostic, peers built an expression-language interpreter both greenfield and brownfield to 0 defects over 50,000 random test programs — catching planted regressions and self-finding edge-case bugs the acceptance suite never probed.
Deutsche Version:
[README_DE.md].
HOWTO: full audit + fix on an existing app:docs/HOWTO-audit-and-fix.md—deutsche Anleitung:implement
mode (build a feature from PLAN.md)docs/MODES_IMPLEMENT.md—DE- Security model: docs/SECURITY.md—DE
peers-ctl new mything --modes=audit --spec ./mything-spec.md
$EDITOR ~/c0de/peers-c0de/mything/.peers/goals.yaml # trim project-specific gates
peers-ctl start mything --max-ticks 20 --max-usd 5
Available modes: see peers-ctl modes list
. Stack multiple with
--modes=audit,thorough
. Current built-in modes:
| Mode | What it does |
|---|---|
audit |
|
| bug-hunt + 3-class test coverage + secrets + deps + API stability + regression + diff-size + skip/xfail justification | |
thorough |
|
| anti-convergence-theater hard gate: N=3 consecutive clean ticks + skeptic-pass + aggressive-honesty soft goals | |
describe |
|
| iterative doc-writing mode — peers write SPEC.md/ARCHITECTURE.md/DESIGN.md until N consecutive non-substantive doc commits. Use BEFORE audit on a repo that lacks docs; not composable with audit modes | |
implement |
|
| end-to-end feature implementation from a markdown PLAN.md — frozen acceptance contract, blind-review between peers, reviewer-only checkoffs, HONESTY_AUDIT + cleanliness gates (no TODO/FIXME/stubs/skipped tests at convergence). Standalone; see | |
Typical multi-mode runs:
peers-ctl new myapp --modes=audit,thorough
peers-ctl new myapp --modes=audit
peers-ctl new myapp --modes=describe # run 1
peers-ctl new myapp-audit --modes=audit,thorough # run 2
peers-ctl new myfeature --container --modes=implement --plan ./PLAN.md
Automatic hooks (opt-out flags):
(default on): substrate scans the repo once before tick 1 and writesrecon
pre-tick.peers/recon.md
(detected languages, key docs, entry-point candidates, top-level tree). Free + fast — no LLM call. Eliminates the "blind tick 1" penalty. Opt out:peers-ctl start <name> --without-recon
.(default on): substrate builds a structural CODEMAP from the AST and writescodemap
pre-tick.peers/CODEMAP.yaml
(machine-readable: every public symbol, itsfile:line
and signature) plus.peers/codemap.md
(a compact, byte-capped digest peers read as context). Free + fast — no LLM call. Primes peers with the codebase's public-API shape before tick 1, on top of recon's file-level view. Opt out:peers-ctl start <name> --no-codemap
.(default on): whenauto-skeptic
post-convergenceconsecutive_clean_ticks >= N
would fireconvergence-reached
, the orchestrator runs ONE extra tick with a critical re-audit prompt. If the skeptic-tick stays clean → really terminal. If it surfaces a new blocking bug → counter resets, loop continues. Opt out:peers-ctl start <name> --without-post-convergence-skeptic
.
peers-ctl new
:
- creates the directory if missing (refuses to scaffold into a
non-empty dir unless
--force
); bare name(no/
) lands under$PEERS_PROJECTS_ROOT
, default~/c0de/peers-c0de/<name>
. Path with/
is taken verbatim;git init
- initial scaffold commit;- ensures a top-level
README.md
exists, even when--force
is used against an existing Git repo; - copies the
--spec
argument toSPEC.md
(existing file paths are read; path-looking missing values such as./typo.md
are rejected); - runs
peers init
(which writes.peers/
, tagspeers-baseline
, commits.gitignore
, and creates.peers/log/runs.jsonl
); - with
--modes=audit
, installs six audit check scripts and an audit-readygoals.yaml
; use--lang=js
,--lang=rust
, or--lang=go
for stack-specific check entrypoints; - registers the project with
peers-ctl
and creates the controller log under the peers-ctl config directory.
To use a different projects root (e.g. on a project-specific
disk): export PEERS_PROJECTS_ROOT=/work/peers/
once, then bare
names land there. peers-ctl doctor
prints the active root.
cd /path/to/your-target-project
peers init # writes .peers/ + commits .gitignore
$EDITOR .peers/goals.yaml # delete `placeholder-replace-me`, write real gates
python3 - <<'PY'
import hashlib, pathlib
p = pathlib.Path(".peers")
(p / "goals.sha256").write_text(hashlib.sha256((p / "goals.yaml").read_bytes()).hexdigest() + "\n")
PY
$EDITOR .peers/config.yaml # only if codex needs a custom argv path
peers info # sanity-check: peers, goals, budget, health
peers-ctl add /path/to/your-target-project --name mything
peers-ctl doctor # confirms tooling + per-project config
peers-ctl start mything --max-ticks 20 --max-usd 5
Modes are baked into .peers/goals.yaml
at scaffold-time. To re-run
the SAME project with a DIFFERENT mode set (e.g. you ran audit
first
and now want audit,thorough
on top):
peers-ctl new mything /path/to/your-project \
--modes=audit,thorough --force
peers-ctl start mything --container --max-ticks 30
git -C /path/to/your-project worktree add \
/path/to/your-project-thorough HEAD
peers-ctl new mything-thorough /path/to/your-project-thorough \
--container --modes=audit,thorough
peers-ctl start --container mything-thorough
Variant 2 is the recommended pattern for iterative audits. Each
run audits a worktree clone; fixes are cherry-picked back via merge
with --no-ff
after review. The worktree pattern keeps your existing
audit history (.peers/state.json
, .peers/log/runs.jsonl
) intact.
peers-ctl status mything # snapshot
peers-ctl dashboard # all registered projects at once
peers-ctl dashboard --live # continuous redraw with alerts/events
peers-ctl dashboard --project mything # drilldown: recent runs + bugs
peers-ctl tail mything # live tail (Ctrl-C to detach)
tail -f /path/to/your-target-project/.peers/log/runs.jsonl # rich per-tick audit
peers -C /path/to/your-target-project replay 3 # inspect tick 3
peers-ctl stop mything # graceful SIGTERM → 10s → SIGKILL
peers -C /path/to/your-target-project report # writes .peers/REPORT.md
peers-ctl report mything # writes controller REPORT-mything.md
peers-ctl review mything # latest handoff self-review
CI guardrails are available as .gitea/workflows/test.yml
plus
scripts/pre-push.sh
; install the local hook with make hooks-install
.
The controller is stateless; the project's own .peers/state.json
and runs.jsonl
are the durable record. If the host reboots
mid-run, peers-ctl list
will mark the project crashed
; you can
start
it again and the loop resumes from the saved iteration.
Project states shown by peers-ctl list:
| State | Meaning |
|---|---|
fresh |
|
scaffolded by peers-ctl new/add but never started |
|
running |
|
| active loop, container/PID alive | |
stopped |
|
exited cleanly — wrote .peers/last-stop-reason.txt with complete , max_ticks , max_iterations , or budget:* reason. A run that reached convergence-reached is stopped , not crashed . |
|
crashed |
|
| process died without a sentinel — segfault, OOM, halt-pattern, goal-mutation, host reboot mid-run |
A mode is a reusable bundle of audit goals + check scripts that
peers-ctl new --modes=…
lays down in .peers/
. Modes are
stackable (comma-separated list) — except describe
, which is mutually exclusive with audit/security modes (it writes docs, not audits code).
Hard gates: self-review-on-handoff
, tests-pass
,
tests-cover-happy-edge-sad
, ** tests-no-unjustified-skip-or-fail (peers must justify every @pytest.mark.skip/xfail)**,
lint-clean
, type-clean
, bug-hunt-clean
, tdd-reproduces-bug
,
no-secrets-committed
, deps-justified
, api-stable
,
no-prior-regression
, diff-size-per-resolve
.Soft goals: bug-hunt-round-1-deep
, bug-hunt-round-2-cross-review
,
tests-3-class-review
.
Use it always. Other modes assume audit
's hard-gates are active and tighten what „clean" means.
Adds:
convergence-reached
(hard, N=3 default): N consecutive clean ticks without new crit/high/med bug-reports — the substrate refuses to declare success without N proofs of stillness.all-peers-healthy
(hard): refuses to declare success while any peer is inunavailable
state (halt-pattern hit).skeptic-pass
(soft, both peers, interval 1): every tick re-audits with extra suspicion; refuses to pass without documenting 5+ failure modes excluded per file/module.aggressive-honesty
(soft, both peers, interval 3): per src top-level path: 3+ failure modes checked, 2+ security categories, 1 test-coverage gap explicitly named.
** thorough alone (without audit) is incomplete** —
convergence- reached
depends on bug-hunt-clean
(from audit) to know what
„clean" means. Always stack with audit: --modes=audit,thorough
.Peers WRITE the project's spec docs (SPEC.md + ARCHITECTURE.md + DESIGN.md) iteratively until N=2 consecutive non-substantive doc commits. Hard gates:
description-files-present
: all 3 files exist, ≥500 chars eachdescription-sections-present
: SPEC has## Threat Model
+## Invariants
+## API
; ARCH has## Components
+## Data Flow
; DESIGN has## Decisions
+## Tradeoffs
; each section body ≥50 charsdescription-converged
: last N commits to the 3 files are non- substantive (no new##
section, <100 lines added, <50% deletion)
Not composable with audit modes — describe writes, audit attacks.
Run --modes=describe
FIRST on a repo that lacks docs, cherry-pick
the produced files into a follow-up --modes=audit,…
run.
End-to-end feature implementation from a markdown PLAN.md.
Standalone — not composable with audit/thorough/describe.
See docs/MODES_IMPLEMENT.md for the
full operator reference: PLAN.md schema, frozen acceptance contracts,
reviewer-only checkoffs, escape valves ([PARTIAL]
/ [BLOCKED]
/
peers-ctl amend
/ peers-ctl ack-block
).
| Project type | Recommended modes |
|---|---|
| First touch on undocumented repo | --modes=describe (alone, run-1) then --modes=audit,thorough (run-2) |
| Existing Python lib / CLI tool | audit,thorough |
| Implement a planned feature | --modes=implement --plan ./PLAN.md |
peers-ctl modes list
always shows the current built-in set.
Two CLIs:
runs the loop INSIDE one repo. The inner driver.peers
registers + supervises one or more peers projects from outside. The outer controller. Spawnspeers-ctl
peers run
(host or container) and tracks PID/container liveness.
peers-ctl modes list # available modes
peers-ctl new <name> [path] --modes=… # scaffold + register
peers-ctl add <path> --name <n> # register an EXISTING .peers/
peers-ctl start [<name>] --container # start (--container = podman)
peers-ctl status [<name>] # one or all
peers-ctl stop [<name>] [--grace-s 10] # SIGTERM → wait → SIGKILL
peers-ctl remove <name> # unregister (does NOT delete .peers/)
peers-ctl list # all projects + state
peers-ctl dashboard # rollup across all projects
peers-ctl dashboard --live --refresh-s 1 # live rollup with alerts/events
peers-ctl dashboard --project <name> # recent runs + bug drilldown
peers-ctl tail [<name>] # follow controller log
peers-ctl logs <name> [-n 100] # print last N lines
peers-ctl report [<name>] # write controller REPORT-<n>.md
peers-ctl review <name> # latest handoff's self-review block
peers-ctl doctor # pre-flight: peers + git + peer CLIs + image
peers-ctl prune <name> # delete old per-project log files
peers -C /path/to/target init # write .peers/
peers -C /path/to/target run # start the loop in current shell
peers -C /path/to/target run --max-ticks 5 # cap ticks
peers -C /path/to/target run --max-usd 1 # cap budget (API-key billing only)
peers -C /path/to/target status # iteration / next peer / lock
peers -C /path/to/target info # config + goals snapshot
peers -C /path/to/target verify # one-shot goal evaluation
peers -C /path/to/target report # write .peers/REPORT.md
peers -C /path/to/target replay <iter> # reconstruct any past tick
peers -C /path/to/target tick --after claude # hooks-driver: trigger after a peer
peers -C /path/to/target watch # follow runs.jsonl
peers-ctl start <name> --without-recon
peers-ctl start <name> --no-codemap
peers-ctl start <name> --without-post-convergence-skeptic
peers-ctl start <name> --max-ticks 50 --max-usd 1
peers run --help
and peers-ctl start --help-man
show the full flag set with descriptions.
Rootless podman's default networking needs the tun
kernel module. Bypass with host networking:
PEERS_CTL_PODMAN_NETWORK=host peers-ctl start --container <name>
For permanent: echo 'export PEERS_CTL_PODMAN_NETWORK=host' >> ~/.bashrc
, then source ~/.bashrc
. Alternatively load the module:
sudo modprobe tun
(persist via /etc/modules-load.d/tun.conf
).
The orchestrator writes .peers/last-stop-reason.txt
and reconcile
maps clean reasons to stopped
. If you still see crashed
post-convergence:
cat .peers/last-stop-reason.txt
— should containcomplete <ts>
.make build
to ensure the container image matches the host code.
process-fail
after ~4min usually = peer CLI returned 5xx (Anthropic Overloaded, Codex rate-limit) and idle-timeout kicked. Run produced no commit. Next tick retries the OTHER peer; the problematic peer auto-recovers if rate-limit was transient.idle-timeout
after exactlyhealth.idle_timeout_s
(default 900s) = peer wrote stdout below the silence threshold for too long. Increaseidle_timeout_s
in.peers/config.yaml
for heavy DA mode runs (peer spends more time thinking before each commit).
A halt-class pattern matched (authentication failed
, quota exhausted
, invalid API key
, usage limit
per
templates/config.yaml
). Operator action required:
-
Re-login or top-up the OAuth account
-
Restart:
peers-ctl start <name> --container -
The loop resumes from the saved iteration
This is intentional — the substrate refuses to silently degrade peers on operator-action failures.
fresh
means the project was registered but NEVER started. After
the first successful peers-ctl start
, state moves to running
,
then stopped
/crashed
on exit. If you intended to start it:
peers-ctl start <name> --container
.
If codex (or any other peer CLI) isn't on the host but is available
in the peers:dev
image, run the loop inside the container:
make build # one-time main image
make proxy-build # egress sidecar
make auth-proxy-build # Claude OAuth sidecar
peers-ctl doctor # confirms podman + image exist
peers-ctl start mything --container --max-ticks 20 --max-usd 5
This spawns podman run -d --rm --name ... --userns=keep-id ... peers:dev run …
and tracks the running container by name via podman ps
. The displayed
PID is only the host-side podman logs -f
streamer. peers-ctl stop --grace-s N
uses podman stop -t N
, then reaps the log streamer.
Container mode bind-mounts the target repo, ~/.claude
, ~/.codex
,
and optional read-only ~/.gitconfig
. When ~/.claude.json
exists,
it is mounted into the per-project peers-auth-proxy_<name>
sidecar
instead of the workspace container; the workspace talks to
ANTHROPIC_BASE_URL=http://127.0.0.1:8080
.
Before launch, peers-ctl
compares the host package version with
peers --version
inside the image: minor/patch drift warns, major
drift refuses start until you rebuild (make build
).
Override the image name with PEERS_CTL_IMAGE=name:tag
if you've tagged your build differently.
pip install -e .[dev]
pytest # the full suite should pass
cd /path/to/your-project
peers init
$EDITOR .peers/goals.yaml # delete the placeholder, write your gates
python3 - <<'PY'
import hashlib, pathlib
p = pathlib.Path(".peers")
(p / "goals.sha256").write_text(hashlib.sha256((p / "goals.yaml").read_bytes()).hexdigest() + "\n")
PY
peers run --max-ticks 20
peers status
tail -f .peers/log/runs.jsonl # rich per-tick audit log
peers replay <iter> # reconstruct any iteration
peers init
writes .peers/
into the target, tags the current HEAD
as peers-baseline
(rollback anchor), snapshots the goals hash
(goals.sha256
), and adds .peers/
to the target's .gitignore
.
If you edit .peers/goals.yaml
manually before starting a run, refresh
goals.sha256
; the loop intentionally halts on unacknowledged goal
changes or if goals.yaml
disappears mid-run.
peers init --driver=hooks # scaffold Stop-hook snippets
peers init --driver=hooks --install # ALSO merge into your host config (with backup)
peers tmux up # sessions driver: tmux up/down/attach
--driver=hooks
drops ready-to-paste fragments in .peers/hooks/
for your ~/.claude/settings.json
and ~/.codex/config.toml
.
--install
(only valid with --driver=hooks
) goes one step further:
it merges the Stop-hook entry directly into your host configs and
writes timestamped backups (settings.json.bak.peers-<ts>
,
config.toml.bak.peers-<ts>
). Behavior:
idempotent— re-running printsnoop
and does not duplicate entries. Each entry is tagged with# peers:<absolute-target-path>
so the installer recognises its own work.drift-aware— if the target path changed (e.g. the project moved), the existing entry is rewritten in place and the old file is backed up.conservative on TOML— if your~/.codex/config.toml
already has a non-peers[hooks]
section with anon_stop
, the installer refuses to touch it and prints a notice (codex has no general TOML merge logic in stdlib; we will not clobber a custom config).Independent failure— patching claude vs codex is independent. Whichever side succeeded is reported on stdout; the other is reported on stderr with the path of the snippet you can merge manually.
Smoke-test after install:
peers status # nothing yet (no run)
peers tick # one manual tick — should run cleanly
peers-ctl
is a host-side controller that supervises many peers loops
without a daemon. Each project is a detached background process; the
controller stores PIDs (with a /proc
-based starttime fingerprint to
guard against PID recycle) under ~/.config/peers-ctl/
.
peers-ctl doctor # pre-flight: peers/git/peer-CLIs + per-project config sanity
peers-ctl add /path/to/project-a --name a
peers-ctl add /path/to/project-b --name b
peers-ctl list
peers-ctl start a --max-ticks 20 --max-usd 3
peers-ctl status a
peers-ctl tail a # follow log via tail -f
peers-ctl report a # write Markdown controller report
peers-ctl review a # show latest handoff self-review
peers-ctl stop a # graceful: SIGTERM -> 10s grace -> SIGKILL; state.json persisted
peers-ctl prune # delete old log files
peers-ctl report
writes a clean Markdown summary to
~/.config/peers-ctl/REPORT.md
(or REPORT-<name>.md
when scoped to
one project). The report includes controller log paths, per-project
tick counts, blocking bug counts, last activity, and README status so a
handoff can spot missing operator docs before the next run.
peers-ctl dashboard
is the fast terminal view: state, ticks, open
hard/soft goals, blocking bug count, running container name, and last
tick timestamp for every registered project. Add --live
for a
periodic redraw that also shows alert state and the newest decoded
Claude session event when available. Add --project <name>
for a
single-project drilldown with recent runs and bug reports; combine it
with --live
to redraw that detail view.
Example peers-ctl doctor
output:
peers-ctl doctor — 3 project(s) registered, config dir ~/.config/peers-ctl
[ok] snake ~/code/snake
2 peer(s), 5 goal(s)
[ok] cpu-emu /tmp/peers-dogfood-r2/cpu-emu
2 peer(s), 8 goal(s)
[FAIL] freshproject ~/code/freshproject
missing ~/code/freshproject/.peers/config.yaml
Warnings:
- `codex` is not on PATH. If any project uses it, either add it to PATH
or set the full path in that project's .peers/config.yaml.
doctor
surfaces three classes of problem up front: missing tooling, missing or unparseable per-project config, and per-project ambiguity (unknown peer name, no goals, etc.). Use it before kicking off a long autonomous run.
config.yaml
accepts an ordered peers:
list. The substrate is neutral about names; pick what you want.
peers:
- name: claude
tool: claude
model: opus # optional; omit to use CLI default
reasoning: high # claude: low|medium|high|xhigh|max
argv: ["claude", "-p", "--dangerously-skip-permissions", "{PROMPT}"]
prompt_mode: argv-substitute
- name: codex
tool: codex
model: gpt-5.1-codex-max
reasoning: xhigh # codex: minimal|low|medium|high|xhigh
provider: openai # openai|openrouter
argv: ["codex", "exec", "{PROMPT}"]
prompt_mode: argv-substitute
- name: claude-2
tool: claude
argv: ["claude", "-p", "--dangerously-skip-permissions", "{PROMPT}"]
prompt_mode: argv-substitute
The legacy tools: {claude: …, codex: …}
mapping is still loaded for back-compat and auto-promoted to the new shape.
model
, reasoning
, and provider
are optional convenience fields.
Explicit argv
switches still win. To scaffold them without editing YAML:
peers-ctl new myapp --modes=audit \
--peer-model claude=opus \
--peer-provider codex=openrouter \
--peer-model codex=~openai/gpt-latest \
--peer-reasoning codex=xhigh
For OpenRouter, export OPENROUTER_API_KEY
before peers run
,
peers tick
, peers tmux up
, or peers-ctl start
; these commands fail
early if the key is missing. Container mode passes the key name through
and opens only openrouter.ai
in the egress proxy allow-list for projects that opt in.
opencode
is a first-class tool alongside claude
and codex
. Run it with
--format json
so the substrate gets the same structured channel it uses for
the others — token + USD accounting (from step-finish
events) and
echo-immune auth/quota halt detection (from error
events):
peers:
- name: opencode
tool: opencode
model: ollama/qwen2.5 # opencode's <provider>/<model> (NOT a separate provider:)
reasoning: high # → --variant high
argv: ["opencode", "run", "--format", "json", "--dangerously-skip-permissions", "{PROMPT}"]
prompt_mode: argv-substitute
opencode is also the simplest path to local models. It is a universal
gateway: configure the backend once in opencode's own config
(opencode providers
, or an opencode.json
custom provider) — ollama, vllm,
llama.cpp, LM Studio, or any OpenAI-compatible /v1
endpoint — then point a
peer's model
at <provider>/<model>
:
model: ollama/qwen2.5 # local via ollama
model: openai-compatible/<name> # local vllm / llama.cpp server
model: anthropic/claude-... # cloud, routed through opencode
The substrate needs no local-model-specific config; opencode resolves the provider. Notes:
provider:
isnot used for opencode — encode the provider inmodel
(provider/model
). Settingprovider:
on an opencode peer is rejected.- Billing for opencode is treated as
warn, never a hardmax_usd
kill (local = free, opencode-hosted = subscription, BYOK cloud = metered — the tool name alone can't tell which, so the conservative default applies). codex
can also reach local models, but onlyollama
/lmstudio
viacodex exec --oss --local-provider …
, or a custom provider that speaks the OpenAIResponses API (wire_api=responses
) — codex dropped chat-API support, so chat-only servers (llama.cpp, vanilla ollama OpenAI-compat) go through opencode instead.
Soft goals get one of these reviewer:
modes:
other
— any non-active peer can submit a review on their turn.both
— every peer must submitconsensus_needed
pass:true reviews.alternating
— review duty rotates one slot per recorded review.quorum
— together withquorum: "N/M"
, pass when ≥N of the most recent M reviews were pass:true.
make build
make init-target TARGET=/path/to/your-target
make run TARGET=/path/to/your-target
make status TARGET=/path/to/your-target
On some hosts the default pasta
network backend fails with
/dev/net/tun: No such device
; make build
therefore uses
BUILD_NETWORK=host
by default. Use make run NETWORK=host TARGET=...
to bypass runtime networking issues too. Plain podman
works without the Makefile:
podman build --network=host -f Containerfile -t peers:dev .
podman run --rm -it --userns=keep-id --cap-drop=ALL \
--security-opt=no-new-privileges \
-v $PWD:/work \
-v $HOME/.claude:~/.claude \
-v $HOME/.codex:~/.codex \
peers:dev run
podman compose
works too (see compose.yaml
) but its
docker-compose
provider needs the podman daemon socket.
Host-side requirement: podman
, git
, python3
. The container brings its own Node.js and the Claude/Codex CLIs.
The peers-ctl
flow is the recommended way to run unattended:
PID-recycle defence. Each start records the process's kernel-issued starttime via/proc/<pid>/stat
;stop
verifies it matches before signalling, so a recycled PID owned by an unrelated process is never killed.Graceful stop.peers-ctl stop
sends SIGTERM, which routes inside the loop into the substrate's KeyboardInterrupt path (state persisted, run.lock released) before falling through to SIGKILL.Lock status clarity.run.lock
is intentionally left on disk after unlock so all contenders use the same inode;peers status
probesflock
and distinguishes an active lock from a stale file.Pre-flight check.peers-ctl doctor
flags missing tooling and per-project misconfiguration in one shot — no surprises 20 minutes into a run.Crash detection.peers-ctl reconcile
(run automatically bylist
/status
/start
) sees that a recorded PID is dead, marks the projectcrashed
, and clears the PID so a freshstart
is unambiguous.No daemon. Each project's loop is a setsid'd background process.peers-ctl
is a stateless CLI; the registry on disk is the source of truth, accessed underfcntl.flock
so concurrent invocations serialise their mutations.
The substrate's health model is output-driven: a peer is "stuck"
when its child process has written nothing to stdout/stderr for
idle_timeout_s
seconds. This works great for chatty peers (codex by default streams progress) but claude in -p (print) mode is silent until the response is ready. A claude tick that sets up a non-trivial project from scratch can take 5–20+ minutes of silent thought before any output appears.
Rule of thumb:
| Task scale | idle_timeout_s |
|---|---|
| Small fixes / single-file edits | 600 (10 min) |
| Multi-file feature work | 1800 (30 min) |
| From-scratch project scaffolding | 3600 (60 min) |
| Heavy refactors of large codebases | 5400 (90 min) |
If you see runs.jsonl entries with classification: idle-timeout
,
your value is too low. Edit .peers/config.yaml
:
health:
idle_timeout_s: 3600
absolute_max_runtime_s
is a separate paranoid ceiling — set it
larger than idle_timeout_s
(e.g. 2× to 4×).
claude -p
in its default text-output mode is silent about token
usage, so budget.max_usd
and budget.max_tokens
are effectively
off — the substrate sees (tokens, usd) = (0, 0)
after every tick.
Fix: switch claude to JSON output. The substrate auto-detects the
envelope and pulls usage.input_tokens + cache_creation + cache_read + output_tokens
and total_cost_usd
.
Edit .peers/config.yaml
once:
peers:
- name: claude
tool: claude
argv: ["claude", "-p", "--dangerously-skip-permissions",
"--output-format", "json", "{PROMPT}"]
prompt_mode: argv-substitute
For incremental output (so a long tick is not silent and idle_timeout_s
sees progress) use stream-json
:
argv: ["claude", "-p", "--dangerously-skip-permissions",
"--output-format", "stream-json", "--verbose", "{PROMPT}"]
claude
(Claude Code) and codex
(ChatGPT-bundled) authenticate via
OAuth → flat subscription. Their total_cost_usd
field reports the API-equivalent price; the user pays $0 incrementally. A hard budget cap is meaningless there — it kills a perfectly-paid run.
max_usd_mode
controls the policy:
| mode | behavior |
|---|---|
auto (default) |
|
inspect ~/.claude/.credentials.json + ~/.codex/auth.json (auth_mode ). All peers OAuth → warn ; any peer using an API key → hard . |
|
hard |
|
exit on cap (pre-Phase-3i behavior). Use this if you set ANTHROPIC_API_KEY / OPENAI_API_KEY . |
|
warn |
|
| log a one-time warning at the threshold; do NOT exit. | |
off |
|
ignore max_usd entirely. |
peers info
shows the resolved mode and the reason it picked, e.g.:
budget: iterations≤20, runtime≤10800s, USD≤$25.0
max_usd_mode=warn (auto: all peers OAuth-billed)
Every peers init
ships five default goals plus the intentional
placeholder-replace-me
hard fail. The default set forces self-review and mutual bug-hunting before claiming convergence:
| Gate | Type | Pass when |
|---|---|---|
self-review-on-handoff |
||
| hard | every handoff commit has ## Self-Review and Self-Review: pass |
|
bug-hunt-clean |
||
| hard | zero unresolved bugs at severity crit /high /med |
|
bug-hunt-round-1 |
||
soft (consensus_needed: 2 ) |
||
| each peer says "round 1 done" | ||
bug-hunt-round-2 |
||
soft (consensus_needed: 2 ) |
||
| each peer says "round 2 done" after round-1 fixes landed | ||
test-coverage-3-class |
||
soft (consensus_needed: 2 ) |
||
| each peer reviewed the other's tests for happy/edge/sad coverage |
A peer files a bug as a standalone commit:
BUG-007: null deref in parser
## Bug-Report
{"id":"BUG-007","severity":"high","fix_by":"codex",
"location":"src/parser.py:42",
"description":"Crashes on empty input; expected: return None."}
Peer: claude
Bug-Report: BUG-007
The fix_by
peer resolves it with another commit:
Resolve BUG-007
## Bug-Resolution
{"resolves":"BUG-007","status":"fixed","note":"guarded with if not s: return"}
Peer: codex
Bug-Resolves: BUG-007
Inspect anytime:
python3 -m peers.bug_hunt summary # human rollup
python3 -m peers.bug_hunt gate /path/to/repo # exit 0 iff clean
peers verify # re-runs every hard gate, includes bug-hunt-clean
Severity ladder: crit
(data loss / RCE) > high
(broken feature)
med
(degraded UX) >low
(nit) >info
(note). Only the top three block completion. Awontfix
resolution keeps the bug in the counter — use only with the other peer's agreement.
The full protocol (when to file vs fix, severity guidance, what NOT to
bug-report) ships in the per-tick prompt as BUG_HUNT_BLOCK
; peers see it on every turn.
When a peer process exits with classification: "api-error"
, the
runs.jsonl
entry includes:
"matched_error_pattern": "Authentication failed",
"matched_error_snippet": "Authentication failed: token expired ..."
so you can see which health.error_patterns
regex fired without
grepping the raw container log. Any non-success tick also records
stderr_tail
and stdout_tail
; soft-review ticks include
soft_reviews_seen
, soft_reviews_ingested
, and
soft_reviews_rejected
.
The substrate's handoff detection reads git commits, not claude's
stdout content, so the format change is safe — only your
per-tick runs.jsonl
console snippet becomes JSON instead of plain
text. peers report
summarizes that for you.
codex emits its own tokens used
line by default; no config change needed there.
After peers run
completes (or on any later check-out of the finished project) you can re-run every hard goal against the current files, without spinning up any peer process:
peers verify # exits 0 iff every gate passes; writes .peers/VERIFY.md
Use it to:
- Confirm
tests-pass
,ruff-clean
,smoke-import
(and whatever else is ingoals.yaml
) on a different machine. - Validate a hand-edit didn't break a gate.
- Smoke-test a UI build with
verify.commands
:
verify:
timeout_s: 60
commands:
- name: cli-help
cmd: "PYTHONPATH=src python -m mything --help"
- name: ui-screenshot
cmd: "xvfb-run -a python tools/screenshot.py out.png"
timeout_s: 30
peers verify
uses goals.timeout_s
for hard goals unless
verify.timeout_s
overrides it. verify.commands
exit code 0 = pass;
non-zero or timeout = fail.
Combined hard-goals + verify.commands result is rendered as a markdown
table at .peers/VERIFY.md
.
State durability.state.json
is atomically written tmp+fsync+rename with a parent-directory fsync, and v1 → v2 schema migration writes astate.json.pre-migration
backup once.Self-review on handoff. Theself-review-on-handoff
hard gate ships on everypeers init
. Every handoff commit must include a## Self-Review
body section andSelf-Review: pass
trailer. The default gate runs the trusted package checker, not a mutable project-local copy.Anti-cheating hard-block. A turn that modifies only test files is reverted (git revert --no-commit
- commit), success is demoted to fail, the peer keeps the turn, and the warning lands in the next prompt. Two reverts in a row mark the peer
degraded
.Sandboxedpass_when
DSL.regex(...)
andjson('path')
are available;json()
is restricted to relative paths inside the target repo, refuses symlinks/hardlinks via the safe readers, and has a 2 MiB read cap.stdout
/stderr
exposed to the DSL are capped at 1 MiB, string literals and regex patterns are bounded, andregex()
has a timeout.Goal-mutation lock.goals.yaml
's sha256 is verified before every tick using no-follow reads; in-loop changes halt the loop with a clear reason, and deletion ofgoals.yaml
is treated as mutation.Control-plane file hardening. State, logs, reports, verify output, controller registry files, and controller logs refuse symlinks, non-regular files, and hardlinks. Log appends open the parent directory with no-follow semantics to block late parent-symlink swaps. State, goals, project config, and controller registry reads are size-capped before JSON/YAML parsing;health.error_patterns
also has count and per-pattern size limits before regex compilation.PID-recycle defence.peers-ctl
records each loop's/proc/<pid>/stat
starttime and refuses to signal a PID whose fingerprint no longer matches.File-channel race-safe. Hybrid-commsend()
uses temp-file + atomic link publication so consumers never see partial messages, and avoids two concurrent senders colliding on the same NNNN.Audit trail.runs.jsonl
recordssoft_fail_reason
, tokens & USD per tick, head_before/after, peer_state_after, warnings_emitted, and thetruncated
flag from HealthGuard.peers init
creates the file up front, andpeers-ctl add/new
creates the controller-side log up front, so there is always a stable place to write or inspect run evidence.
src/
├── peers/ # the substrate
│ ├── cli.py # peers init / run / status / tick / replay / watch / tmux
│ ├── driver_orchestrator.py # public facade
│ ├── _driver_orchestrator_impl.py # thin runtime coordinator
│ ├── driver_*.py # decomposed lifecycle / observability / health hooks
│ ├── state_store.py # schema v2 + v1 migration
│ ├── turn_manager.py # round-robin over n peers
│ ├── goal_engine.py
│ ├── goals.py # YAML + pass_when DSL
│ ├── peer_spec.py # PeerSpec + load_peer_specs
│ ├── comm_layer.py # GitCommLayer + HybridCommLayer
│ ├── health_guard.py # streaming reader + idle-timeout + truncation
│ ├── prompt_builder.py
│ └── templates/
├── peers_ctl/ # the controller
├── cli.py # add / remove / list / start / stop / status / review / logs / tail / prune
├── store.py # registry on disk, fcntl-locked
└── runner.py # detached spawn + PID-recycle defence
└── auth_proxy/ # OAuth sidecar server
tests/
├── unit/ # unit tests
└── integration/ # smoke + adversarial peer fixtures
docs/HOWTO-audit-and-fix.md— end-to-end recipe to audit + fix an existing applicationdocs/MODES_IMPLEMENT.md—implement
mode operator referencedocs/SECURITY.md— threat model + per-layer mitigations