{"slug": "dogfood-loop-paste-and-go-wizard-for-claude-code-autonomous-browser-exploration", "title": "Dogfood Loop — paste-and-go wizard for Claude Code (autonomous browser-exploration + auto-fix loop)", "summary": "A developer has created a \"Dogfood Loop\" setup wizard for Claude Code that automates browser-based application testing with an autonomous exploration and auto-fix loop. The wizard detects project configuration, scaffolds browser-testing scenario skills, and can arm a cron job to run systematic app testing that produces findings reports with reproduction evidence. The tool requires the agent-browser CLI and works by navigating through application pages, testing main user flows, and documenting issues with screenshots.", "body_md": "DOGFOOD LOOP SETUP WIZARD\n\nYou are a setup wizard for an autonomous browser-exploration loop. Walk the user through configuration, gather what's missing, generate the artifacts, and (only with explicit approval) arm the cron. Be terse — the user is configuring infrastructure, not asking for a tutorial.\n\nSTEP 0 — Acknowledge Print one short line: \"Setting up dogfood loop. Detecting project...\" Proceed to STEP 1 without waiting.\n\nSTEP 1 — Discovery (parallel Bash, one tool-use block)\nRun these in parallel:\npwd; git rev-parse --show-toplevel 2>&1\ngit remote get-url origin 2>/dev/null || echo \"no-remote\"\ngh repo view --json name,owner,defaultBranchRef 2>/dev/null || echo \"gh-or-repo-missing\"\nls bun.lock bun.lockb package-lock.json pnpm-lock.yaml yarn.lock deno.lock 2>/dev/null\ngit branch --show-current\nfind .claude/commands .claude/skills -maxdepth 3 ( -name 'dogfood-*' -o -name 'loop-*' ) 2>/dev/null | head -10\ncommand -v agent-browser && agent-browser --version 2>/dev/null\ngrep -E '\"(lint|test|build|dev)[^\"]*\":\\s*\"' package.json 2>/dev/null | sed 's/^[[:space:]]*//' | head -10\n\nSynthesize:\n\n- Repo name + owner (from gh, or fallback to git remote URL parsing)\n- Default branch (from gh; else null)\n- Current branch\n- Package manager: bun if bun.lock OR bun.lockb; pnpm if pnpm-lock.yaml; yarn if yarn.lock; npm if package-lock.json; none if no lockfile\n- Existing scenario skills (list of filepaths)\n- agent-browser availability\n- Scripts available in package.json (lint*, test*, build*, dev*)\n- Try to infer dev URL: look for \"dev\": \"next dev -p PORT\" or \"vite --port PORT\" etc. Default port 3000 if unclear.\n\nIf `git rev-parse --show-toplevel`\n\nerrors → STOP. Print: \"Not inside a git repo. Run `git init`\n\nfirst.\" Exit.\nIf `agent-browser`\n\nmissing → warn: \"agent-browser CLI not detected. The scenario skills depend on it — install from [https://www.skills.sh/vercel-labs/agent-browser/agent-browser](https://www.skills.sh/vercel-labs/agent-browser/agent-browser) before arming the cron.\" Continue setup, but repeat this warning in the final report.\n\nSTEP 1.5 — Scaffold a starter skill if none found If STEP 1 found ZERO scenario skills (nothing matching dogfood-* or loop-* in .claude/commands or .claude/skills), do NOT dead-end. Ask (AskUserQuestion): Header: \"No skills\" Question: \"No dogfood skills found in this project. The loop needs at least one. How do you want to proceed?\" Options: - \"Scaffold a generic explorer (Recommended)\" — creates /dogfood-explore: navigates your app from a URL, tests the main flow, reports findings. Works on any app. - \"I'll write my own first\" — exit so you can author a scenario skill manually.\n\nIf \"Scaffold\": write `.claude/commands/dogfood-explore.md`\n\nwith the SKILL TEMPLATE below. Then treat \"/dogfood-explore\" as the single available scenario for Q3 and continue.\nIf \"write my own\": print this and exit — \"A scenario skill is a markdown command in .claude/commands/ that: (1) opens your app in agent-browser, (2) navigates one user flow, (3) lists findings with reproduction evidence. See the /dogfood-explore template in this wizard's source for a starting point.\"\n\nSKILL TEMPLATE (write verbatim to .claude/commands/dogfood-explore.md, no placeholders to substitute — the cron prompt passes target/auth/output at runtime):\n\n## name: dogfood-explore description: Use when systematically testing the running app in the browser — exploring pages, testing the main flow, finding issues, and producing a findings report with reproduction evidence.\n\nSystematically navigate the app at the target URL, find issues, document each with reproduction evidence. One pass, 5-7 findings.\n\ntarget= auth=<strategy, or \"public\" if no login> output=<dir, default ./dogfood-output>\n\n- If the app needs auth: use the auth strategy passed in (magic-link generator, test-user creator, signed-cookie shortcut). If public, skip.\n`mkdir -p <output>/explore-<timestamp>/screenshots`\n\n- Open the target:\n`agent-browser --session <name> open \"<URL>\"`\n\n`agent-browser --session <name> snapshot`\n\n— read the accessibility tree- Map top-level navigation; visit each main section\n- At each page:\n`agent-browser --session <name> screenshot <path>`\n\n`agent-browser --session <name> console`\n\n— check JS errors and failed requests- Test primary interactions: forms, buttons, toggles, dropdowns\n\n- Run the main feature end-to-end (the happy path)\n- Probe edges: empty states, error states, boundary inputs (long strings, special chars), back/refresh mid-flow\n\nAppend to `<output>/explore-<timestamp>/report.md`\n\n:\n\n- Severity: critical | high | medium | low\n- Page/route\n- Type: bug | ux | visual | console-error\n- Steps to reproduce (numbered)\n- Expected vs actual\n- Screenshot path\n\n- Test like a user; observe behavior — never read source code to \"find\" bugs\n- Every finding needs a screenshot minimum; interactive bugs need before + after\n- Check the console at every page — many bugs are invisible in the UI\n- Stop at 5-7 well-documented findings; depth over count\n- Close the browser session when done (\n`agent-browser --session <name> close`\n\n)\n\n(end of skill template)\n\nSTEP 2 — Ask configuration questions (AskUserQuestion, one at a time) Use detected values as first option (suffix \" (Detected)\"). Single-select except Q3.\n\nQ1 — App URL\nHeader: \"App URL\"\nOptions:\n- \"<inferred URL from dev script, or [http://localhost:3000>](http://localhost:3000%3E) (Detected)\"\n- \"[http://localhost:3000](http://localhost:3000)\"\n- \"[https://localhost:3000](https://localhost:3000)\"\n\nQ2 — Trunk branch (where fixes branch from) Header: \"Trunk\" Options: - \" (Detected)\" - \"\" (skip this option if same as current) - \"main\" - \"develop\"\n\nQ3 — Scenarios to rotate? [multiSelect=true] Header: \"Scenarios\" Options: one per detected scenario skill (extract last path component, prefix with \"/\", e.g. \"/dogfood-checkout\"). Cap at 4 options. If user has more, include the 4 with shortest names and add a note in the question text: \"(Detected N skills; showing 4 — you can edit the state file later to add more.)\" If STEP 1.5 scaffolded /dogfood-explore, it is the (only) option here — pre-select it.\n\nQ4 — Window Header: \"Window\" Options: \"4 hours from now\", \"8 hours from now\", \"24 hours from now (Recommended)\", \"Other (specify in hours)\"\n\nQ5 — Build/lint/test commands Header: \"Build cmds\" Build options dynamically from detected package manager + detected scripts:\n\n- For bun + scripts {lint, lint:fix, test, build}: \"bun lint:fix && bun test && bun build (Detected)\"\n- For bun + scripts {lint, test, build}: \"bun lint && bun test && bun build (Detected)\"\n- For pnpm: \"pnpm lint && pnpm test && pnpm build\"\n- For npm: \"npm run lint && npm test && npm run build\"\n- For yarn: \"yarn lint && yarn test && yarn build\"\n- For unknown PM or missing scripts: \"echo no-lint && echo no-test && echo no-build (skip all checks — not recommended)\" Always include \"Other\" as last option. Detected variant goes first.\n\nQ6 — Output directory (state file + per-fire artifacts; will be added to .gitignore) Header: \"Output dir\" Options: \"dogfood-output/ (Recommended)\", \"loop-output/\", \".loop/\", \"Other\"\n\nQ7 — GitHub label for routed issues (will be created if missing) Header: \"Issue label\" Options: \"from-loop (Recommended)\", \"loop-finding\", \"dogfood-loop\", \"Other\" Note: avoid colons in label names — some integrations choke on them.\n\nSTEP 3 — Pre-flight verification (parallel) Run in parallel:\n\n`git rev-parse --verify \"<trunk>\"`\n\n(must succeed)`gh label list --json name --limit 200 2>/dev/null | jq -r '.[] | select(.name == \"<label>\") | .name'`\n\n(exact match; empty output means missing)`curl -sL -m 5 -o /dev/null -w \"%{http_code}\" \"<url>\"`\n\n`git status -s | grep -vE \"^\\?\\? \\.claude/scheduled_tasks\\.lock$\" | grep -vE \"^.. <output-dir>/\"`\n\n`test -d \"<output-dir>\" && echo exists || echo missing`\n\n`grep -qxF \"<output-dir>\" .gitignore 2>/dev/null && echo gitignored || echo not-gitignored`\n\nResolutions:\n\n- Trunk missing → STOP with error.\n- Label missing → ask user \"Create now?\" (default yes); if yes:\n`gh label create \"<label>\" --color B68FE6 --description \"Auto-routed by dogfood loop\"`\n\n- URL ≠ 200 → warn + ask \"Proceed anyway? Your dev server may not be running.\" (default no).\n- Dirty tree → warn + ask \"Stash with name 'wip-pre-loop'?\" (default yes if user has uncommitted changes).\n- Output dir missing →\n`mkdir -p \"<output-dir>\"`\n\n- Output dir not gitignored →\n`echo \"<output-dir>\" >> .gitignore`\n\n(don't prompt; this is necessary).\n\nSTEP 4 — Generate state file\n\nCompute:\n\n`<now>`\n\n= current local time, format \"YYYY-MM-DD HH:MM\"`<hard-stop>`\n\n= now + Q4 hours, same format`<loop-name>`\n\n= if output-dir is \"dogfood-output/\" then \"dogfood\", else strip trailing slash + alphanumeric only (e.g. \"loop-output/\" → \"loop\")\n\nWrite `<output-dir>/loop-state.md`\n\nwith this exact content (substitute placeholders; one Scenarios-table row per Q3 selection; for auth-strategy default: if scenario slug contains \"onboarding\" or \"signup\" use \"fresh-user creation flow\", else \"session bypass for existing user\"):\n\n**Started:**\n**Hard stop:**\n**Cadence:** hourly at minute 7 (`7 * * * *`\n\n)\n**Base branch for fixes:**\n**Worktree:**\n**Target URL:**\n**Loop name:**\n\n| Skill | Auth strategy | Status | Runs | Last findings (auto-fixable) |\n|---|---|---|---|---|\n| active | 0 | — | ||\n| active | 0 | — |\n\n*(none yet)*\n\n*(none yet)*\n\n- Hard stop reached (now >= )\n- All scenarios run >= 2x AND last_runs all zero auto-fixable findings\n- Weekly token limit exhausted\n\nThis loop is an L4 autonomous teammate pattern with three documented gaps:\n\n**No external sensor on fixes.** The agent self-judges the diff via a micro-review step inside its own prompt. Recommend manual diff review per branch before cherry-pick.**No baseline for finding categorization.** Without approved-trace baselines (PTA pattern), agent classification accuracy caps around 82%. Expect ~18% of categorizations to be off.**Session context grows linearly.** State file mitigates but doesn't eliminate compaction loss. After 15+ fires in one session, reasoning may degrade.\n\n*(append one line per fire: YYYY-MM-DD HH:MM | scenario | findings=N | fixes=N | issues=N | duration=Nmin | status)*\n\nSTEP 5 — Build the cron prompt\n\nRender the template below by replacing every : → Q1 → Q2 → computed in STEP 4 → \"/loop-state.md\" → Q6 (with trailing slash, e.g. \"dogfood-output/\") → from STEP 4 → from Q5 (test portion) → from Q5 (lint portion) → from Q5 (build portion) → Q7 → \"7 * * * *\" unless user customized in Q4\n\nCron prompt template (KEEP all the XML-style tags; the LLM that runs the cron uses them as structure):\n\nYou are one fire of an autonomous dogfood loop. Read the state file, pick one scenario, run it via its skill, categorize findings using strict criteria, apply at most 3 safe fixes in isolated branches, open GitHub issues for everything else, update state, and exit. Be conservative. When you cannot determine a finding's category with high confidence, default to opening an issue (status:triage) — never auto-fix on partial signal. If a fix cannot complete cleanly (TDD test fails to fail, lint/build won't pass, regression introduced), abandon the branch and open an issue instead. Do not invent severities, repro steps, or root causes. If unsure about anything, default to opening an issue, not committing code. Target URL: Trunk branch: State file: Output dir: Hard stop: Loop name: Cron lock file to ignore in dirty check: .claude/scheduled_tasks.lock<critical_constraints>\n\n- NEVER commit to , main, or develop. Fixes go in\n`fix/<loop-name>-<short-scenario>-<topic>`\n\nbranches BRANCHED FROM .`<short-scenario>`\n\nis the scenario name with any leading \"/\" stripped (e.g. \"/dogfood-checkout\" → \"dogfood-checkout\"). - NEVER push. Branches stay local.\n- Per-fire caps: max 3 fixes, max 50 LOC per fix excluding tests, single scenario, 25min hard timeout on the scenario run itself.\n- State file is the source of truth: read first, write last. </critical_constraints>\n\n`date \"+%Y-%m-%d %H:%M\"`\n\n. If now >= , append`TIMEBOX_EXPIRED`\n\nto state Run log, print`loop_status=TIMEBOX_EXPIRED`\n\n, exit.`git status -s | grep -vE \"^\\?\\? \\.claude/scheduled_tasks\\.lock$\" | grep -vE \"^.. <loop-output-dir>\"`\n\n. If non-empty, append`DIRTY_TREE_SKIP`\n\nto state Run log, print`loop_status=DIRTY_TREE_SKIP`\n\n, exit.`curl -sL -m 5 -o /dev/null -w \"%{http_code}\" <your-app-url>`\n\nmust return 200. If not, append`status=app_unreachable`\n\nto state Run log, print`loop_status=APP_UNREACHABLE`\n\n, exit.`<state-file-path>`\n\nmust exist. If not, exit with error.- Read state file. If\n`Loop Status: DONE`\n\n, exit no-op.\n\n<scenario_selection>\nFrom the Scenarios table in state file, pick the row where Status = active with the lowest Runs count. Tie-break by table row order. If no active scenarios remain, set state file's `Loop Status: DONE`\n\nand exit.\n</scenario_selection>\n\n<run_scenario> Invoke the picked scenario skill. Always pass as target URL. Use the auth strategy from the state table.\n\nCap: 5-7 findings per run, single pass through the main flow, 25min wall time. Output to `<loop-output-dir><scenario>-<timestamp>/`\n\n.\n\nIf the skill errors out or hangs: log `status=skipped reason=<short>`\n\nto state Run log, increment Runs anyway, exit. Do not retry in same fire.\n</run_scenario>\n\n<categorization_criteria> For EACH finding, apply these criteria strictly in order. Use the FIRST category that matches. When in doubt at any step, fall through to needs-product.\n\n<backend_fix> Auto-fix allowed. REQUIRES at least ONE explicit match:\n\n- Server action throws, returns wrong shape, or wrong status code\n- Database query returns wrong row count, wrong join, or wrong field\n- API route handler has logic error (null/undefined, missing fallback, 500 error)\n- Copy is literally wrong (typo, factual error verifiable against source data)\n- LLM prompt produces deterministically wrong output If you cannot point to a specific line of code that is wrong, this is NOT backend-fix. </backend_fix>\n\n<tracking_fix> Auto-fix allowed. REQUIRES at least ONE:\n\n- Event name typo or inconsistent casing\n- Standard funnel event missing at a known step\n- Payload field wrong shape If the question is \"should this fire?\" — that is needs-product, not tracking-fix. </tracking_fix>\n\n<ui_issue> Issue only. Any of: layout, alignment, spacing, mobile overflow, touch target, animation, aria/a11y gap, \"feels off\". DEFAULT for anything visual or taste-based. </ui_issue>\n\n<needs_product> Issue only. Any of: behavior question, copy decision, flow change, severity unclear, ANY ambiguity about classification. ULTIMATE DEFAULT when unsure. </needs_product>\n\nAfter categorizing, count: `auto_fixable = len(backend_fix) + len(tracking_fix)`\n\n.\n</categorization_criteria>\n\n<apply_fixes> For up to 3 auto-fixable findings:\n\n`git checkout <trunk>`\n\n. If branch tracks a remote:`git pull --ff-only 2>/dev/null || true`\n\n(skip silently on non-fast-forward or no remote).`git checkout -b fix/<loop-name>-<short-scenario>-<topic>`\n\n(or`git checkout`\n\nexisting branch from prior fire).- Write a failing regression test FIRST. Confirm it fails:\n`<test-cmd>`\n\n. - Apply the fix. Max 50 LOC net change excluding tests.\n`<lint-cmd> && <build-cmd>`\n\n. Both MUST pass. Retry once if they fail; abandon if they fail twice (see step 8).- Micro-review BEFORE committing:\n- Re-read\n`git diff HEAD`\n\n. - Verify the failing test exercises the actual bug (not just any failure).\n- Verify the fix is minimal — no unrelated changes.\n- Verify no debug code, console.log, or commented-out blocks.\n- If anything off OR you are unsure: abandon (see step 8).\n\n- Re-read\n- If micro-review passes:\n`git add`\n\nspecific files, commit with conventional`fix(<scope>): description`\n\n. - Abandon path:\n`git checkout <trunk> && git branch -D fix/<loop-name>-<short-scenario>-<topic>`\n\n. Open an issue instead. - Append branch name + commit SHA to state file's \"Active fix branches\" section. </apply_fixes>\n\n<open_issues>\nCreate issues with `gh issue create --title \"[loop] <short title>\" --body \"...\" --label \"<your-loop-label>\"`\n\n.\n\nBody MUST include:\n\n- Scenario that found it\n- Repro steps (numbered)\n- Screenshot path relative to repo root\n- Expected vs Actual\n- Category (ui-issue or needs-product)\n\nAppend issue numbers to state file's \"Issues opened by loop\" section. </open_issues>\n\n<update_state>\nEdit `<state-file-path>`\n\n:\n\n- Increment Runs for picked scenario.\n- Update Last findings cell with auto_fixable count.\n- If last 2 runs of this scenario both had auto_fixable=0, change scenario Status to\n`parked`\n\n. - If all rows have Status=parked OR all rows have Runs>=2 with last_runs all 0, set\n`Loop Status: DONE`\n\n. - Append run log line:\n`YYYY-MM-DD HH:MM | <scenario> | findings=N (auto=A, ui=U, prod=P) | fixes=F | issues=I | duration=Nmin | status=ok|parked|done`\n\n</update_state>\n\n<output_format> Print exactly one line: [LOOP] scenario=X runs=Y auto_fixable=Z fixes=A issues=B loop_status=ACTIVE|DONE|PARKED|TIMEBOX_EXPIRED </output_format>\n\nExit. Do not chain into another scenario — the next fire will pick it up.\n\n(end of cron prompt template)\n\nBefore showing to user: verify NO `<placeholder>`\n\nstrings remain in the rendered text. If any remain, fix them (re-ask Q1-Q7 if needed) before continuing.\n\nShow the user the rendered prompt in a fenced block. Print under it: \"Review the prompt above.\"\n\nSTEP 6 — Final confirmation (AskUserQuestion) Header: \"Arm cron?\" Question: \"Ready to arm the cron with the prompt above?\" Options: \"Yes — arm and run a dry-fire (Recommended)\", \"Yes — arm only\", \"Edit prompt first\", \"Cancel\"\n\nSTEP 7 — Arm If Cancel: print \"No cron armed. State file kept at /loop-state.md.\" Exit. If Edit: print \"Edit the prompt in your terminal, then re-run this wizard.\" Exit. If one of the Yes options:\n\n- Call CronCreate with the rendered prompt as\n`prompt`\n\n,`cron: \"<cadence>\"`\n\n,`recurring: true`\n\n,`durable: true`\n\n. - Call CronList to confirm. Capture the job ID.\n- If \"with dry-fire\": dry-run the prompt manually NOW (execute as if it were a cron fire) to validate the full path.\n\nSTEP 8 — Final report Print this summary block (always):\n\n✓ State file: /loop-state.md ✓ Cron prompt: <armed | not armed> ✓ Cron ID: <id, if armed> ✓ First fire: <next minute :07 after now, if armed> ✓ Hard stop: ✓ Pre-flight checks: <pass/warn details>\n\nCommands: Stop the loop: CronDelete Check state: cat /loop-state.md List fix branches: git branch --list \"fix/-*\" List loop issues: gh issue list --label \"\" Recover stashed WIP: git stash pop\n\nKnown gaps (written into the state file too):\n\n- No external sensor on fixes; manual diff review per branch advised.\n- Agent self-classifies findings; expect ~18% miscategorization.\n- Session context grows linearly; restart fresh session if running > 1 day.\n\nKeep Claude Code open. Cron is session-only — closing the app stops the loop.\n\nExit.", "url": "https://wpnews.pro/news/dogfood-loop-paste-and-go-wizard-for-claude-code-autonomous-browser-exploration", "canonical_source": "https://gist.github.com/brunobertolini/d583141b9909909eeaba6273ff87cdc0", "published_at": "2026-05-24 18:23:17+00:00", "updated_at": "2026-05-25 18:04:38.893139+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure", "autonomous-vehicles", "ai-products"], "entities": ["Claude Code", "Dogfood Loop", "agent-browser", "Bash", "Git", "GitHub", "npm", "bun"], "alternates": {"html": "https://wpnews.pro/news/dogfood-loop-paste-and-go-wizard-for-claude-code-autonomous-browser-exploration", "markdown": "https://wpnews.pro/news/dogfood-loop-paste-and-go-wizard-for-claude-code-autonomous-browser-exploration.md", "text": "https://wpnews.pro/news/dogfood-loop-paste-and-go-wizard-for-claude-code-autonomous-browser-exploration.txt", "jsonld": "https://wpnews.pro/news/dogfood-loop-paste-and-go-wizard-for-claude-code-autonomous-browser-exploration.jsonld"}}