I built a 9-agent AI dev team in a Claude Code plugin — here's what happened

A developer built a nine-agent AI development team as a Claude Code plugin, orchestrating specialized agents to take a feature request from requirements analysis through production deployment. The plugin, called claude-dev-pipeline, assigns roles including product manager, architect, backend and frontend engineers, QA tester, code reviewer, and DevOps specialist, with human approval checkpoints at each phase. The system successfully generated a complete authentication feature for a task manager app, including code, tests, and deployment configuration, in approximately 20 minutes.

I was building a side project — a simple task manager app. I opened Claude Code, typed: "Add user authentication with email and password login" …and hit enter. Twenty minutes later, I had code. A lot of code. Authentication logic, routes, middleware, even some basic tests. But there was a problem. The frontend me, on a different day had assumed a different API shape. The tests only covered the happy path. There was no architecture decision to reference — I just picked JWT because it felt right. And the docker-compose.yml ? It didn't exist yet. I had AI-generated code, but no real software development workflow. Good software isn't just code. Before you write a single line, you need: Normally, a team handles all of this. A PM writes the spec. An architect proposes options. Engineers implement and review each other's work. A DevOps person sets up CI/CD. What if AI could fill all those roles? I built claude-dev-pipeline — a Claude Code plugin that orchestrates a team of specialized AI agents, each with a specific job. A Claude Code plugin that orchestrates 7 specialized AI agentsto take your feature request all the way from requirements analysis to production deployment — with a human-in-the-loop checkpoint at every phase. Writing a feature involves more than just code. You need: claude-dev-pipeline encodes that workflow as a Claude Code plugin. Each agent is an expert. You stay in control at every gate. | | Agent | Role | Output | |---|---|---|---| | 0 | Discovery | Clarifies vague requirements via dialogue | Confirmed requirement | | 1 | Exploration | Scans existing codebase in parallel | .pipeline/exploration.md | | 2 | PM | Writes a structured PRD with user stories & acceptance criteria | .pipeline/pm.md | | 3 | Architect | Proposes 2–3 architecture options | The idea was simple: instead of one big AI doing everything, use multiple agents in sequence — each expert at one job — with you approving the output at every important gate. Nine agents, nine phases: | | Agent | Role | Output | |---|---|---|---| | 0 | Discovery | Clarifies vague requirements via dialogue | Confirmed requirement | | 1 | Exploration | Scans existing codebase in parallel | .pipeline/exploration.md | | 2 | PM | Writes a structured PRD with user stories & acceptance criteria | .pipeline/pm.md | | 3 | Architect | Proposes 2–3 architecture options with trade-offs | .pipeline/architect.md | | 4a | Backend | Implements REST APIs, services, repositories | src/backend/ | | 4b | Frontend | Implements React UI, hooks, API client | src/frontend/ | | 5 | QA | Writes and runs unit, integration, and E2E tests | tests/ | | 6 | Reviewer | Audits code for security, bugs, and quality confidence ≥ 80 | .pipeline/review.md | | 7 | DevOps | Creates Dockerfile, docker-compose, GitHub Actions CI/CD | deploy/ | Phases 4a and 4b run in parallel. The full flow: User requirement │ Discovery ← asks clarifying questions if vague │ confirmed requirement Exploration ← scans codebase in parallel 2 sub-agents │ exploration.md PM ← you review & approve PRD │ pm.md Architect ← you choose 1 of 3 architecture options │ architect.md ┌────┴────┐ Backend Frontend ← run in parallel └────┬────┘ QA │ tests green Reviewer ← pipeline pauses if Critical issues 3 │ review.md clean DevOps │ 🎉 Done Human-in-the-loop at every critical gate. You approve the PRD before architecture begins. You choose one of three architecture options before code is written. The Reviewer pauses everything if it finds more than 3 critical issues. AI does the work; you stay in control. Backend and Frontend run in parallel. Since both agents were given the same architecture document, they fit together. This shaves real time off the pipeline and eliminates the classic "your API doesn't match what I expected" problem. Every agent writes a structured artifact. The PM writes .pipeline/pm.md . The Architect writes .pipeline/architect.md . These files become the living memory of your project — persistent knowledge that survives across pipeline runs and future features. Git auto-commits after each approved phase. Every milestone is tracked: pipeline: PM — add PRD for <feature pipeline: Architect — add architecture for <feature pipeline: Implement <feature backend + frontend 1. Clone the repo git clone https://github.com/airwaves778899/claude-dev-pipeline.git 2. Register as a local marketplace claude plugin marketplace add "C:\path\to\claude-dev-pipeline" Windows claude plugin marketplace add "/path/to/claude-dev-pipeline" macOS/Linux 3. Install claude plugin install claude-dev-pipeline 4. Verify claude plugin list should show: ✓ claude-dev-pipeline enabled Then, inside any project with Claude Code: /claude-dev-pipeline:dev-pipeline start "Add user authentication with email + password login" You can also target a single agent or resume from a specific phase: /claude-dev-pipeline:dev-pipeline run --agent architect /claude-dev-pipeline:dev-pipeline run --from qa /claude-dev-pipeline:dev-pipeline status Stack profiles let you skip the tech-stack configuration prompt: /dev-pipeline start "Add payment processing" --stack python /dev-pipeline start "Build mobile onboarding" --stack flutter Supported: ts-node default , ts-react , python , go , flutter . I thought the hardest part would be writing the agent prompts. It wasn't. The hardest part was handoffs . Each agent needs to know exactly what the previous agent decided. The PM agent's output has to be structured in a way the Architect can actually parse. The Architect's decision has to be specific enough that Backend and Frontend can implement without contradicting each other. I went through many iterations. The current solution: every agent reads the previous .pipeline/ .md files as context, and writes its own output in a documented schema. Structured handoffs, not vibes. The second hard part was encoding my own opinions into prompts. When I write code alone, I make dozens of micro-decisions automatically. Teaching an agent to make those same decisions consistently — and to explain its reasoning — took real effort. It's essentially writing a very opinionated style guide for each role. Structure beats raw intelligence. A well-prompted agent that always produces a specific output format is more useful than a powerful model that does something different every time. Approval gates are not friction — they're the whole point. The value of this pipeline isn't speed. It's that you understand every decision that was made. You approved the PRD. You chose the architecture. You reviewed the tests. When something breaks in production, you know why — because you were part of every decision. AI agents need to read before they write. The Exploration agent was a late addition, but it turned out to be essential. Without it, the Backend agent would generate code with no awareness of existing patterns, naming conventions, or architecture choices already in the codebase. Reading first changed everything. The project is open source MIT and actively evolving. Recent additions in v3.0.0 include: /dev-pipeline fix "description of the bug" I'm planning to add support for multi-repo pipelines and a web-based progress dashboard. ⭐ If this resonates, give it a star: github.com/airwaves778899/claude-dev-pipeline https://github.com/airwaves778899/claude-dev-pipeline I'm curious: what's the most painful part of your own dev workflow that you wish an AI could handle? Drop it in the comments. This plugin is built on Claude Code, Anthropic's CLI tool for agentic coding. The plugin system lets you define custom agents and slash commands that Claude Code can load and orchestrate.