{"slug": "i-made-claude-code-think-before-it-codes-then-i-gave-it-a-team", "title": "I Made Claude Code Think Before It Codes. Then I Gave It a Team.", "summary": "A developer upgraded Claude Code from a disciplined solo coder to a multi-agent architecture that operates like a software engineering team. The v2 system uses an orchestrator that never writes code, dispatching tasks to specialist subagents working in parallel on isolated git worktrees. The developer's role shifted from writing code and prompts to tuning workflows and supervising agents, enabling features to be shipped while they sleep.", "body_md": "A while back I wrote a post called *\" I Made Claude Code Think Before It Codes. Here's the Prompt.\"* The premise struck a nerve: the problem with AI coding assistants was never\n\nSo I gave it a process. I called it `/wizard`\n\n, and it forced one Claude into the habits of a disciplined senior engineer: read the codebase before writing a line, define \"done\" with acceptance criteria *before* implementing, write a failing test first, then the minimum code to pass it, then try to break what you just wrote. Same output, working code, but without the 2am \"why is this broken in production\" follow-up.\n\nThat was version one. One Claude, one discipline, one pull request at a time.\n\nThis post is about what happened when I stopped trying to make Claude a better *developer* and started making it a better *engineering team*, and about what that did to my own job. Because here is the part nobody warned me about. In v1, I stopped writing the code and stopped doing line-by-line reviews. In v2, the next layer fell away too: I stopped writing the deep technical prompts and the GitHub issues that fed the work. What's left is a different altitude entirely. I tune the workflows and supervise the agents that run inside them. I conduct.\n\nNew here?You don't need to have read the v1 post to follow this one, but it's the foundation everything below is built on. The original is linked at the bottom, along with the open-source repo where all of this lives.\n\nHere's the thing about a disciplined senior developer working alone: they're still working alone. They read carefully, test thoroughly, self-review, sequentially, one task at a time, blocking on every code-review round trip. v1 was a fantastic individual contributor, and an individual contributor has a ceiling: one pair of hands.\n\nA *real* senior architect doesn't sit in the editor all day. They decompose a problem into separable concerns, write the contract everyone builds against, and hand the backend, the UI, and the test coverage to people who work *at the same time*. They plan, dispatch, integrate, and keep the review pipeline moving, and almost never write the code themselves. That's v2: the mental model went from \"make Claude a senior developer\" to **\"make Claude a senior architect who runs a team,\"** and, for me, from running the team to *conducting* it. Concretely: a single main-thread **orchestrator** that never writes code itself, fanning work out to specialist **subagents**, each in its own isolated git worktree, building in parallel, driving many pull requests at once through an automated review gate.\n\nThe emotional payoff that genuinely surprised me the first time I watched it: I'd describe a feature, walk away, and come back to find an engineering team had been shipping while I slept.\n\nIf you used the original `/wizard`\n\n, here's the whole upgrade in one table:\n\n| v1 (one disciplined developer) | v2 (an architect running a team) | |\n|---|---|---|\nFrom idea to work |\nYou hand-write the ticket | An issue-maintainer agent turns a one-line idea into a structured issue or epic |\nWho writes the code |\nOne Claude does everything | An orchestrator that writes no code; specialist subagents implement |\nConcurrency |\nOne task at a time | A cohort of up to ten pull requests open and moving at once |\nBuild shape |\nRead, test, implement, review, sequentially | Architect designs the contract; backend and frontend build in parallel off it, using TDD; a QA specialist verifies |\nReview gate |\nMonitor your code-review bot, fix findings, repeat | An independent reviewer that didn't build it; findings routed back to the specialist whose layer they live in, across all PRs at once |\nYour role |\nArchitect who stopped writing code and reviewing it line by line | Conductor who also stopped writing the prompts and the issues, and now tunes the workflow |\n\nEverything that made v1 work is *still here*, underneath. v2 doesn't replace the discipline. It **distributes** it across a team and runs that team in parallel. In fact, v1 is preserved verbatim as \"direct mode,\" and the team only spins up when the work is complex enough to be worth it. A one-line fix never pays the team tax. Let me walk through the pieces.\n\nThe very front of the pipeline is the part I underestimated longest. I used to write the tickets. A feature would occur to me in the shower, and the price of acting on it was turning a vague sentence into a well-formed issue: title, acceptance criteria, labels, a link to the parent epic. That quietly throttles everything downstream. A sloppy ticket produces sloppy output no matter how good the team is.\n\nSo that became an agent too. An **issue-maintainer** takes a one-line idea (\"let an admin turn on a guided walkthrough for new users\") and produces a *structured* issue: a clear title, explicit acceptance criteria, consistent labels, and the parent-to-sub-issue links that tie an epic to its pieces. I stopped formatting tickets the same way I stopped formatting code.\n\nThe point isn't that it saves me ten minutes of typing. The point is **consistency**. When every issue has the same shape, same label vocabulary, acceptance criteria written the same way, the same epic-to-subtask structure, the rest of the machine runs on a clean, uniform source of truth: the orchestrator picks up any issue and immediately knows what \"done\" means, and the builders inherit acceptance criteria they can write a failing test against. Consistent issues are the rail the whole train rides on. Idea to issue is the first agent step, not something I do by hand before the agents start.\n\n`git commit`\n\n)\nThe single most important design decision in v2 is the line between the orchestrator and the workers, and where exactly it sits.\n\nThe **orchestrator** is the main thread: the Claude you actually talk to. It plans, dispatches subagents, monitors the pipeline, and integrates results. It does *not* open an editor on application code. The moment it does, it stops orchestrating, burning the context that ten parallel pull requests depend on and serializing work three specialists could have done at once. The **workers** are subagents: each gets a focused brief, does the implementation, runs the affected tests, and commits locally, then returns one result message: branch name, final commit SHA, what it touched.\n\nThe handoff boundary is exactly `git commit`\n\n. The subagent commits; the orchestrator does everything from `git push`\n\nonward: push, open the PR, run the review cycle. A commit is the two-phase-commit point between *local work* (fully reversible) and *external commitments* (CI fires, reviewers get notified, check-runs get recorded against that SHA). Splitting responsibility there buys three concrete things. **You verify the diff before you expose it**: a worktree cut at dispatch can go stale if siblings merge, and a quick fetch-and-rebase catches the phantom-deletion diff before it confuses anyone. **You get clean failure recovery**: a subagent that crashes mid-task has pushed nothing, so the orchestrator just salvages the working tree instead of cleaning up a half-built PR. And **you get a single monitoring owner**: exactly one entity knows the state of every in-flight PR, so it declares a PR ready exactly once and composes the title and description from cross-cutting context a subagent never has.\n\nHere's where it stops looking like one assistant and starts looking like a team with a roster. When the orchestrator gets a non-trivial piece of work, it doesn't dispatch \"a builder.\" By this point the issue-maintainer has already turned the raw idea into a structured Github issue with acceptance criteria, so the ensemble has something concrete to build against. From there it runs in a deliberate order:\n\n**1. The architect goes first, and writes no production code.** Its job is to design the subsystem, enumerate the invariants, run the concurrency analysis (what happens if this runs twice at once? what must stay true across every path that touches this data?), and produce two artifacts: a **failing-test spec** encoding the acceptance criteria (the ones the issue-maintainer wrote into the issue, now made executable), and a **data contract**, every field the UI and backend will exchange, with its type, range, and default. It's read-only; it designs, it does not build. That contract is the seam that keeps the team honest: every builder's output is checked against a concrete failing test the architect specified, not the builder's own loose reading of the brief.\n\n**2. Then the builders go, in parallel, off that one contract.** A backend specialist takes the services, models, and migrations; a frontend specialist takes the UI; a QA specialist authors the coverage. They run *simultaneously*. The frontend doesn't wait for the backend, because it already knows the exact shape of the data it'll receive. Each owns a **non-overlapping set of files**, so they never collide in the same tree. A genuinely single-domain change collapses to one builder, but splitting is the *default*, not the exception.\n\n**3. Then the critics verify, and crucially, they didn't build it.** This is generator/evaluator separation, and it matters: the agents that *wrote* the change are not the agents that *sign off*. The QA specialist comes back after the code is green, applies a mutation-testing mindset (don't assert \"it worked,\" assert the specific value and exact count that would break if the code mutated), and confirms the acceptance criteria are actually covered.\n\nAnd then there are the **domain-user lenses**, my favorite part. For each kind of user your product has, there's one adversarial critic whose job is to read the change through *that persona's eyes* and find where it breaks for them. Admin, end user, power user become an admin lens, an end-user lens, a power-user lens. Each runs two probes: **feature parity** (\"a capability was added for the admin, should the power user get an analogue?\") and **cross-actor leak** (\"will this admin-only feature surface on a screen the end user shares?\"). A lens that says \"not applicable\" has to have *run* both probes and reasoned them empty. It's a conclusion you earn, never a step you skip. Finally a **documentation librarian** verifies the docs actually got updated to match the code.\n\nOne hard rule ties the ensemble together: **the agents never talk to each other.** They run in isolated contexts and return exactly one result; the architect can't hand its spec to a builder, a builder can't hand its diff to QA. Every hand-off is *orchestrator-mediated*: it reads agent A's output, distills the part agent B needs, and bakes it into B's brief. This isn't a swarm of peers negotiating; it's a manager decomposing work, dispatching isolated specialists, and stitching their one-shot results into the next link of the chain.\n\nThe parallelism is nice, but the *quality* is the real win. Take a neutral example: you ask the team to **add an admin capability that enables a new onboarding walkthrough for users.** A single competent developer, even a disciplined one, builds exactly that, ships it, and it works. The acceptance criteria are met.\n\nBut the end-user lens, the critic whose only job is to think like a regular user, ran its cross-actor leak probe and asked what nobody had: *what is this new behavior actually bound to?* It was bound to a shared, low-level UI component, the kind of context-free primitive a checkbox or a toggle is, that gets reused across many screens, including ones a regular user sees. And that's the trap. You can secure a privileged surface by splitting it per role. You cannot secure a primitive: a checkbox is just inputs and outputs, it has no idea whether it's sitting on an admin console or an end-user settings page. The thing that's supposed to decide \"may *this* actor trigger *this* capability\" lives a tier above it, in the controller and service layer, not in the component. The admin behavior had been wired straight onto the shared primitive without that middle-tier gate guarding the specific binding, so the same primitive rendered on an end-user screen quietly inherited the wiring. A regular user, with nothing to do with this admin feature, would have been able to trip it.\n\nNo test for the admin feature would catch this, because the admin feature works perfectly. The bug only exists *for a different actor than the one you were building for*, and it got caught only because a critic's mandate was to ask \"what does this do to *my* user's world?\" The solo dev verifies what they built; the team verifies that *and* its blast radius across every other actor, right down to which tier is actually holding the authorization line.\n\nHere's the habit that took me longest to unlearn: thinking of work as *one task at a time*. The orchestrator doesn't. It doesn't pull one issue, finish it, and pull the next. It grabs a **cohort** of up to ten issues at once and drives them *all* toward merge-ready concurrently, each in its own worktree with its own subagents and its own pull request. Up to ten pull requests, open and moving at the same time.\n\nA code-review round trip takes minutes; the CI suite takes more. A solo developer blocks on every one of those windows. They push, then *wait*. The orchestrator refuses to. While one PR is out for review, it's prepping the next feature in a fresh worktree and checking on a third that just got findings back, sweeping every open PR on a cadence and acting on what it finds. Idle is, quite literally, forbidden: any \"I'm waiting for X\" with independent work available is a process failure.\n\nAnd the cohort **refills itself**. As one PR merges, a slot opens and the orchestrator pulls the next issue off the backlog by urgency. There's backpressure so it doesn't spiral, a hard ceiling on in-flight PRs, and a rule that when you hit it you drive a *merge* before starting anything new, but the default posture is motion: ten things in flight, continuously topped up, every one marching toward \"ready for you to merge.\"\n\nThe one thing that stops everything: a broken main branch. If the shared baseline is red, the entire cohort pauses and converges on fixing it. Every open PR inherits the breakage and every new branch spreads it. It's the only condition that legitimately makes the team drop the cohort for a single thing.\n\nThis is the step I won't let a PR skip, and it's the one that most earns the word *team*. No matter how good the build was, the architect's invariants, the adversarial persona lenses, the independent QA pass, a dedicated AI code reviewer that **did not build the thing** always catches something. Every single PR goes through it. That's the whole point of separating the people who write from the people who sign off: the builders are too close to their own work to see what they assumed; a fresh reviewer isn't. So the gate is non-negotiable. There is no \"this one's simple, skip review.\"\n\nThe loop: open the PR, let the automated code-review bots look at it (CodeRabbit is a good public example, use whatever your stack has), read **every** finding, fix the real ones, reply to and resolve the false ones, repeat until the PR is genuinely clean. Only *then* is it merge-ready. No silent ignores.\n\nThe discipline that makes this work, the thing I'd most want you to take away, is to **separate the reviewer's premise from its suggested remedy.** A good bot is usually *right that something is wrong* and frequently *wrong about how to fix it*, because it doesn't know your stack the way you do. Verify the *premise*; don't blindly apply the *remedy*. Half a reviewer's value is the question it raises, not the answer it proposes.\n\nBut the part that makes this a *team* and not a pile of agents is that the findings **don't dead-end at the reviewer.** The orchestrator takes every finding back to the team. When one is real, the orchestrator doesn't fix it itself. It routes the finding to a fresh subagent of the right specialty: a backend finding to the backend specialist, a UI finding to the frontend specialist, a missing test to QA. One finding, one focused fix, dispatched to whoever owns that layer. When a finding is a false positive, right premise, wrong remedy, or simply mistaken, the orchestrator replies with the reasoning and resolves the thread. Either way the loop *closes*: review, route, fix or rebut, resolve. And that closing is what separates a team from a stack of disconnected tools.\n\nHere's the realization that reframed the whole thing, and it landed in two steps.\n\nIn v1, I stopped writing the code and reading every diff line by line, the headline of the original post. But I was still doing everything *upstream* of the work: writing the deep technical prompts, hand-authoring the GitHub issues, deciding exactly how each thing should be built. I'd traded being a typist for being a very busy author of specs and tickets. In v2, that layer fell away too. The issue-maintainer turns my one-liners into structured issues, so I stopped hand-writing them. An independent reviewer reads every diff and the orchestrator routes what it finds back to the specialists, so the review I used to do by hand now happens without me. What's left isn't writing of any kind.\n\nWhat's left is *conducting*. A conductor doesn't play an instrument during the performance, and doesn't write each musician's part note by note. They set the tempo, cue the sections, decide how the piece should feel, and keep everyone playing together. That's exactly what's left for me:\n\nWhat I explicitly do *not* do anymore is hand out granular tasks (\"create this file, add this method, write this test\"). I keep ideas flowing through the pipeline and let the pipeline do the decomposition. The whole arc looks like this:\n\nidea → issue/epic(the issue-maintainer)→ a cohort of up to ten(the orchestrator)→ parallel build(architect, then builders, then critics)→ independent AI review → findings routed back to the team → merge-ready → I merge → production.\n\nMy hands are on exactly two ends of that arc: the idea that starts it and the merge that finishes it. Everything between is the team, and conducting is keeping the flow moving without ever picking up an instrument myself.\n\nNone of this works without the v1 foundation. TDD is still RED before GREEN; that's the architect's failing-test spec. \"Attack your own code before you ship it\" is now *institutionalized* as the critic lenses and the independent QA pass. \"Read before you write\" is an explicit exploration phase before anyone touches a file. v2 didn't throw away the disciplined developer. It cloned them into a team, put an architect in charge, and left me to conduct.\n\nA few honest caveats, because the failure modes are real:\n\nThe full workflow, the issue-maintainer, the orchestrator skill, the architect, the builders, the persona lenses, the librarian, and the parallel-pipeline machinery, is open source and MIT-licensed at ** github.com/vlad-ko/claude-wizard**. The repo ships an\n\n`agents/`\n\nroster you dispatch, a `reference/`\n\nset of deep-dives (the threading model, the parallel pipeline, the PR review cycle), and an `ARCHITECTURE.md`\n\nwith the system diagrams. Fork it. Adapt the personas to `domain-user-lens`\n\nis a template you copy once per persona), swap in your test runner, your CI, and your code-review bot, and make it yours. The framework-specific details are stripped out; the methodology is language- and stack-agnostic.If you want the single-skill, one-disciplined-developer version, the original `/wizard`\n\n, it's preserved at the ** v1 git tag** and still works. Start there if a team feels like more than you need. A lot of people will never want more than the solo architect, and that's a completely reasonable place to live.\n\nAnd if you haven't read it, the original post that started all this, *\" I Made Claude Code Think Before It Codes. Here's the Prompt.\"*, is the foundation everything above is built on. v2 is just what happens when you take one disciplined developer and ask:\n\nNow go build something while you sleep.", "url": "https://wpnews.pro/news/i-made-claude-code-think-before-it-codes-then-i-gave-it-a-team", "canonical_source": "https://dev.to/_vjk/i-made-claude-code-think-before-it-codes-then-i-gave-it-a-team-2bl8", "published_at": "2026-06-20 01:07:41+00:00", "updated_at": "2026-06-20 01:36:49.380580+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "developer-tools", "ai-products"], "entities": ["Claude Code", "Anthropic", "Claude"], "alternates": {"html": "https://wpnews.pro/news/i-made-claude-code-think-before-it-codes-then-i-gave-it-a-team", "markdown": "https://wpnews.pro/news/i-made-claude-code-think-before-it-codes-then-i-gave-it-a-team.md", "text": "https://wpnews.pro/news/i-made-claude-code-think-before-it-codes-then-i-gave-it-a-team.txt", "jsonld": "https://wpnews.pro/news/i-made-claude-code-think-before-it-codes-then-i-gave-it-a-team.jsonld"}}