This is a submission for the Hermes Agent Challenge
I shipped a multi-tenant Flutter SaaS — a futsal-pitch booking app with a consumer app, an operator admin console, QR check-in, walk-in POS, and a deployed Firebase backend — in one overnight sitting. The git timeline runs from 19:38 to 03:09. I never typed app code.
I drove four (then five) Claude Code agents in parallel, each in its own pane, each owning one directory of a shared monorepo. The conductor was ** Hermes Agent** (Nous Research, MIT). The visible substrate was
This is the Write category. So this is not a product tour. It is the orchestration teardown: the patterns that let N agents grind one repo at once without colliding, how they signal "done" without anyone polling, and how a fresh build lands on a phone seconds after it compiles. Every claim below traces to a commit, a file on disk, or a Hermes skill. Pointers at the bottom.
The basic architecture diagram is given bellow,
Four panes, one repo, one conductor. No agent touches another's directory.
Then the magic of the markerfile for pingback loop.
Concurrent commits, from docs/timeline.md
(the wiki pane logs every commit, which agent, why):
## 2026-05-30 02:43–02:44 — Email/Password + PRD-007/008 backend callables
AGENT: backend claude (aa54dc9, 10dc2c6) + frontend claude (fc20bf2).
## 2026-05-30 00:21–00:28 — FirebaseApi wiring + multi-tenant data model
AGENT: frontend claude (10beb79, 236fe01) + backend claude (dacb833)
— landed concurrently with this docs upgrade.
The watcher catching an APK build and a "done" ping in the same window:
[03:06:10] APK INSTALL: app-arm64-v8a-release.apk -> Success
[03:07:10] frontend-done: auth errors now show the real code, not generic text...
APK rebuilt: /tmp/futsal-frontend-apk; cold-launch->/signin verified.
github.com/morsheded/futsal-booking
github.com/NousResearch/hermes-agent
assets/futsal-watcher-redacted.sh
Orchestration: Hermes Agent (conductor) · Herdr (visible terminal substrate) · Claude Code CLI ×5 (the workers). App: Flutter 3 / Dart 3 · Riverpod · go_router · Firebase (Auth + Firestore + Functions v2 asia-south1
node22 + Hosting + Storage). Delivery: Tailscale (wireless adb
) · mobile_scanner
(web QR). Memory: self-hosted Honcho (localhost:8000
, Ollama-backed, Colima + Docker).
This is the whole mental model. Hermes does not implement features. Hermes writes a goal-file per pane, dispatches it into a Claude Code instance, arms a watcher, reads the result marker, and reseeds the next goal. The Claude panes do every line of Dart and TypeScript.
A real goal-file (/tmp/goal-frontend.txt
, trimmed):
GOAL: ship PRD-006 (auth routing fix + email/password) on the CLIENT FLUTTER APP only.
Your turf is `app/` and `packages/gameon_core/`.
READ FIRST:
- docs/prd/PRD-006-auth-routing-fix-and-dual-login.md (your spec)
- app/CLAUDE.md (rule 0: never `git add -A`)
PRECONDITION: backend claude is enabling Email/Password provider in Firebase Auth in
parallel. If sign-in fails with `operation-not-allowed`, the provider isn't on yet —
wait, retry. Do NOT mock; this is real Firebase.
WHEN DONE (truly idle, not between subtasks):
echo "<one-line: what shipped + recommended next>" > /tmp/futsal-frontend-done
The goal-file is self-contained: identity, turf, what to read, the cross-pane precondition, and the exit contract. Hermes is the only thing holding global state. Each pane holds only its lane.
Unlike tmux
, Herdr is purpose-built for agents. Every pane is addressable by ID. Every pane reports a status (idle / working / blocked) into the status bar. Hermes pipes commands in from outside, and I watch the whole thing in real time in my own attached terminal. The herdr-cli
skill is how Hermes drives it:
herdr pane run $PANE "cd /path/to/project && claude --dangerously-skip-permissions"
sleep 5 # let the TUI draw
herdr agent send $PANE "$(cat /tmp/goal-frontend.txt)"
herdr pane send-keys $PANE Enter # agent send does NOT submit — Enter is mandatory
That missing Enter
is the #1 "I sent it but Claude isn't doing anything" bug. It is in the skill in bold because it bit us twice.
Each pane owns a directory. That is the entire contract:
| Pane | Owns |
|---|---|
| frontend | |
app/ , pubspec.yaml , pubspec.lock |
|
| backend | backend/ |
| wiki | |
docs/ , wiki/ (maintenance loop) |
|
| admin | |
apps/admin/ , packages/gameon_core/ |
|
| e2e (added when Playwright landed) | |
e2e/ , .github/workflows/e2e.yml |
|
The hard rule, repeated verbatim as rule #0 in every pane's CLAUDE.md
(this is from app/CLAUDE.md
):
0. NEVER `git add -A` or `git add .` from repo root. This is a multi-agent monorepo
with 3 other Claudes (backend, wiki, admin) writing to `main` in parallel + an
auto-commit hook. Scope every commit to YOUR files only: `git add app/ pubspec.yaml
pubspec.lock`. Before commit run `git status --short` and verify nothing outside your
turf is staged. If you see foreign paths, that's another agent's WIP — leave it alone.
Why it matters: if agent A runs git add -A
while agent B has uncommitted WIP in wiki/
, A sweeps B's files into A's commit. Now history says feat(backend): X
but the diff contains wiki edits, and B comes back to a "clean" tree and gets confused. Stage explicit paths, never -A
, and the entire class of collision bugs disappears. The multi-claude-monorepo-discipline
skill is one page and that one rule is the whole cost of parallelism.
Diagram:
This is the signal layer. Reading a pane's TUI is noisy — ANSI redraws, partial output. So agents don't get polled. Each agent writes one line to its own marker file the moment it goes idle:
echo "<what shipped + recommended next>" > /tmp/futsal-frontend-done
Hermes arms a background watcher that does nothing until the file exists:
while [ ! -f /tmp/futsal-frontend-done ]; do sleep 20; done
MSG=$(cat /tmp/futsal-frontend-done)
rm -f /tmp/futsal-frontend-done # delete immediately, so the next task can re-ping
echo "FRONTEND CLAUDE IDLE: $MSG"
When the marker appears the loop exits, notify_on_complete=true
fires the stdout back into the parent Hermes chat as a system message, and Hermes relays the one-liner to me. The signal is exactly what the agent decided was done — it authored it. Zero polling, zero context churn. One unified watcher (/tmp/futsal-watcher.sh
) loops all five markers plus the APK marker:
for f in frontend backend admin e2e wiki; do
MF=/tmp/futsal-$f-done
if [ -f $MF ]; then
MSG=$(cat $MF); rm -f $MF
echo "[$(date +%H:%M:%S)] $f-done: $MSG" | tee -a $LOG
fi
done
My Samsung S24 Ultra lives on my tailnet at 100.101.80.104
, with adb
pinned to port 5555
once over USB (adb tcpip 5555
). When the frontend pane finishes a build it writes the .apk
path to a marker, and the same watcher installs it:
if [ -f /tmp/futsal-frontend-apk ]; then
APK=$(cat /tmp/futsal-frontend-apk); rm -f /tmp/futsal-frontend-apk
adb connect 100.101.80.104:5555 >/dev/null 2>&1
adb -s 100.101.80.104:5555 install -r "$APK"
fi
Build finishes → phone has it in seconds. No USB cable, no Telegram (the bot caps uploads at 50MB; a debug APK is ~144MB anyway — release split-per-abi is ~18MB and fits, but the cable still beats the chat). The log shows the full loop:
✓ Built build/app/outputs/flutter-apk/app-arm64-v8a-release.apk (18.2MB)
already connected to 100.101.80.104:5555
[20:26:57] installing app-arm64-v8a-release.apk to 100.101.80.104:5555...
Success
[20:27:13] ✓ batch #1 installed on phone (17M)
Nothing gets built without a written intent. Every feature has a numbered PRD in docs/prd/
(PRD-001 through PRD-009 shipped this run). Hermes writes the PRD before dispatching any pane, and the PRD is the first thing the goal-file tells the agent to read. PRDs reference prior PRDs and ADRs, so the system stays coherent without anyone holding the design in their head. The docs/
tree is the authoritative spec; wiki/
is the auto-derived "what currently exists" view the wiki pane regenerates from git. The point, from the venture-prd-workflow
skill: the project must be rebuildable from docs/ alone — even after a full stack swap.
This is the pattern almost nobody is running yet. Long-lived Claude panes degrade as their context window fills. My rule: after a major feature ships, when a pane crosses ~300k tokens, force the mattpocock/handoff
skill — compact the conversation into /tmp/futsal-<pane>-handoff.md
, then /exit
. Next time that pane is needed, relaunch a fresh Claude and seed it with the handoff doc.
The backend pane hit the threshold right after shipping the PRD-006/007/008 callables. Its handoff doc (/tmp/futsal-backend-handoff.md
, 64 lines) is a clean state transfer — current deployed state, source-of-truth file pointers, flagged decisions, gotchas, and the exact rule that protects it:
## Commit discipline (HARD RULE — see app/CLAUDE.md rule #0)
`main` is shared by 4 Claudes + an auto-commit hook. NEVER `git add -A`/`.`/`**`.
Stage explicit `backend/` paths only... If a collision happens anyway, don't rewrite
shared history — note it in /tmp/futsal-backend-done. (Last time `git add -A` swept the
wiki Claude's WIP into a backend commit — don't repeat.)
A fresh pane seeded with this doc has the facts without the drift of a 300k-token conversation. Compaction-induced hallucination on long autonomous runs basically stops.
Best part: I have the rule catching a real collision, captured live in an artifact. The admin pane went to commit and found the e2e pane's WIP already staged in the shared index. It didn't sweep it. It reported it in its own done-marker (/tmp/futsal-watcher.log
, 03:05:17
):
[03:05:17] admin-done: ... commit 832f9cf scoped to apps/admin/ + packages/gameon_core/ ONLY.
DEVIATIONS / NOTES:
- MULTI-AGENT: the e2e claude's staged e2e/ files were already in the shared index
when I went to commit; I git-restored them out so my commit stayed turf-only
(apps/admin + packages/gameon_core). e2e/ left untouched for the e2e claude.
The agent noticed foreign paths, git restore
'd them out, committed only its turf, and narrated the whole thing so the conductor knew. That is the discipline rule working as a self-healing mechanism — not a guardrail that blocks, but a norm the agent reasons about out loud.
(The repo's history also records the other outcome — dacb833
in the timeline notes a wide git add
that swept the wiki pane's WIP, with a "history is not rewritten, noted here" entry. The rule postdates that commit. The collision happened once, got documented, became rule #0, and then prevented its own recurrence at 03:05:17
. That arc is the actual lesson.)
Last layer. The same Mac runs three Hermes profiles — default
(the operator that ran this build), nexus
(homelab IT), donna
(Obsidian second-brain). Each is a full Hermes instance with its own skills, plugins, cron, and memory, and they share one self-hosted Honcho memory layer (localhost:8000
, workspace hermes
, Ollama-backed via Colima + Docker). So the orchestration nests: Hermes drives a fleet of Claude panes for coding, and Hermes itself runs as a fleet of profiles for everything else. Same multi-agent shape, one level up.
The speed didn't come from a better model. It came from structure around the model:
git add -A
None of this is model-specific or even Flutter-specific. It is a coordination protocol. Hermes is open source (NousResearch/hermes-agent, MIT) and so is the discipline. Steal the patterns.