# Autonomous multi-agent workflow for OpenCode — plan, review, implement, PR from a Linear issue

> Source: <https://gist.github.com/ppries/f07fd6316bbd45807dd7a1896555b05b>
> Published: 2026-02-20 10:12:15+00:00

# Autonomous Multi-Agent Workflow for OpenCode

A fire-and-forget workflow that takes a Linear issue ID and autonomously plans, tests, implements, and opens a draft PR — with TDD baked in. You walk away; it notifies you when done.

Built for [OpenCode](https://opencode.ai) using custom agents and slash commands.

> **Important:** The `/workflow` command must run with `agent: build` (OpenCode's default agent with full tool access). The orchestrator needs unrestricted access to do git operations, dispatch subagents, and create PRs. If you're in a restricted mode, switch to build first.

**Why this exists:**
- **Fire and forget.** Kick off a task and walk away. You get notified when it's done or needs attention.
- **Fresh perspectives.** Each subagent starts with clean context — no accumulated assumptions from the main session. Reviewers see the work with genuinely fresh eyes, not colored by having watched it being built.
- **Context isolation.** The main agent's context window stays clean. Instead of one agent accumulating thousands of lines of implementation detail, each `@make` task runs in a fresh session with only the relevant code snippets. The orchestrator stays light.
- **Test-first by default.** `@test` writes failing tests before `@make` touches any production code. Specs get validated as executable assertions before a single line of implementation.

## How It Works

```
/workflow SUN-123
```

```mermaid
sequenceDiagram
    participant User
    participant Main as Main Agent
    participant PM as @pm (Linear)
    participant Check as @check (Reviewer)
    participant Simplify as @simplify (Reviewer)
    participant Test as @test (TDD)
    participant Make as @make (Implementor)

    User->>Main: /workflow SUN-123
    Note over User: User walks away

    Main->>Main: 1. Verify repo setup (bare clone, gh auth)
    Main->>PM: 2. Fetch issue context
    PM-->>Main: Title, description, acceptance criteria
    Main->>Main: 3. Create git worktree from master

    Main->>Main: 4. Create implementation plan (with Test Design)
    par 5. Review plan
        Main->>Check: Review for risks/gaps + testability
        Main->>Simplify: Review for overengineering
    end
    Note over Main: Max 3 review cycles with convergence detection

    Main->>Main: 6. Split plan into discrete tasks
    loop 7. For each task
        Main->>Test: Write failing tests (RED)
        Test-->>Main: Test files + failure classification
    end
    loop 8. For each task
        Main->>Make: TDD mode: verify RED → implement GREEN
        Make-->>Main: Implementation + RED→GREEN evidence
    end

    par 9. Final review
        Main->>Check: Review full implementation
        Main->>Simplify: Review full implementation
    end

    Main->>Main: 10. Commit (conventional), gh pr create --draft
    Main->>PM: Post PR link on Linear issue
    Note over User: Notification: workflow complete
```

**Ten phases, five agents, zero interaction required.**

## The Agents

Each agent has a single job and constrained tool access. See the raw files for the full definitions.

### `@check` — Design Reviewer ([check.md](#file-check-md))

Reviews plans and code for risks, gaps, and flaws using an 8-point framework (Assumptions, Failure Modes, Edge Cases, Compatibility, Security, Ops, Scale, Testability).

**Key design choices:**
- Read-only — no write, edit, or bash. It cannot modify what it reviews.
- Uses a different model (`gpt-5.3-codex`) than the main agent to get a genuinely different perspective.
- Severity is evidence-calibrated: BLOCK requires a concrete failure path, not speculation.
- Defers pure complexity concerns to `@simplify` — no overlap.
- Reviews test code from `@test` when escalated (real behavior assertions, not mock existence).
- Signs off on NOT_TESTABLE verdicts (allowed reason? evidence of attempt?).

**Annotated highlight — the severity calibration:**
```
| BLOCK  | Will cause outage/data loss/security breach | Concrete failure path |
| HIGH   | Likely significant problems                  | Clear mechanism       |
| MEDIUM | Could cause edge-case problems               | Plausible scenario    |
| LOW    | Code smell, style, minor                     | Observation only      |
```
Without evidence, findings are capped at MEDIUM. This prevents review theater where everything is "critical."

### `@simplify` — Complexity Reviewer ([simplify.md](#file-simplify-md))

Spots overengineering: YAGNI violations, indirection without payoff, accidental complexity, premature optimization.

**Key design choices:**
- Also read-only. Same trust boundary as `@check`.
- Explicit precedence rule: `@check` safety findings are hard constraints. If `@simplify` recommends removing something `@check` flags as needed, `@check` wins.
- Protected patterns (retries, circuit breakers, auth) are never flagged unless clearly unused.

**Annotated highlight — the core question:**
> For each component, ask: "What if we deleted this?" Justify its existence in one sentence. Can't? Flag it.

### `@test` — TDD Test Author ([test.md](#file-test-md))

Writes meaningful failing tests from task specs, verifies they fail for the right reason (RED), then hands off to `@make` for implementation (GREEN). This is the newest agent — it makes TDD the default workflow.

**Key design choices:**
- Writes test files only — **cannot modify production code under any circumstances**. This is enforced by file pattern matching and a post-step file gate in the orchestrator.
- Uses `claude-sonnet-4-6-1m` (1M context) — same model as `@make`, needs to understand the codebase deeply to write meaningful tests.
- Has bash access but sandboxed to test runners and read-only commands. Same deny list as `@make`.
- Classifies every failure with structured codes: `MISSING_BEHAVIOR`, `ASSERTION_MISMATCH`, `TEST_BROKEN`, `ENV_BROKEN`. Only the first two qualify as valid RED.
- Reports an escalation flag when tests need `@check` review (mixed failure codes, nondeterministic behavior, >2 mocks).
- Can return `NOT_TESTABLE` for config-only changes, pure wiring, etc. — but only with justification and `@check` sign-off.

**Annotated highlight — the failure classification:**
```
| MISSING_BEHAVIOR    | Function/class doesn't exist yet  | ImportError, AttributeError | Valid RED |
| ASSERTION_MISMATCH  | Code exists but wrong behavior    | AssertionError with diff    | Valid RED |
| TEST_BROKEN         | Test itself has errors             | Collection/fixture error    | Fix first |
| ENV_BROKEN          | Environment issue                 | Missing dependency          | BLOCKED   |
```
This classification prevents false RED — a test that fails because of a typo in the test file is not the same as a test that fails because the behavior doesn't exist yet.

### `@make` — Task Implementor ([make.md](#file-make-md))

Receives a task spec with acceptance criteria and implements it. Each invocation gets fresh context — only the task spec and relevant code snippets.

**Key design choices:**
- Uses `claude-sonnet-4-6-1m` (1M context) — fast and cheap enough to run per-task, capable enough to implement well-scoped changes. The large context window accommodates full code context.
- Has write/edit/bash, but bash is heavily sandboxed:
  - Can run: `uv run pytest`, `uv run ruff`, `ls`, `rg`, `diff`
  - Cannot run: `git`, `pip`, `curl`, `wget`, `ssh`, `rm`, `mv`, `cp`
- Strict file constraint: can only touch files explicitly listed in the task spec.
- No new dependencies without explicit approval.
- Max 2-3 fix attempts before stopping — prevents infinite loops.
- **TDD mode:** When pre-written tests are provided by `@test`, validates RED first, implements GREEN, reports RED→GREEN evidence. If tests are questionable, escalates to the caller rather than editing test files.

**Annotated highlight — the bash sandbox:**
```yaml
permission:
  bash:
    "*": deny                    # Default deny everything
    "uv run *": allow            # Allow test runner
    "uv run bash*": deny         # ...but not shell escape
    "uv run curl*": deny         # ...or network access
    "uv run git*": deny          # ...or version control
    "ls *": allow                # Read-only inspection
    "rg *": allow                # Search
    "git *": deny                # Explicit top-level deny
```

### `@pm` — Project Management ([pm.md](#file-pm-md))

Fetches and updates Linear issues via the [Linear CLI](https://github.com/schpet/linear-cli). That's it.

**Key design choices:**
- Uses the cheapest model (`claude-haiku-4.5`) — it's just fetching/posting structured data. The CLI has `--json` output so structured parsing is straightforward.
- Has bash access, but sandboxed to `linear *` commands only. Everything else is denied. Issue deletion is also denied.
- The `linear` CLI is globally denied in bash permissions so only `@pm` can use it (the agent overrides with `"linear *": allow`).

## The Commands

### `/workflow` — Fire-and-Forget Orchestrator ([workflow.md](#file-workflow-md))

The main command. Takes a Linear issue ID, runs all ten phases autonomously. See the sequence diagram above and the raw file for the full phase definitions.

```
/workflow SUN-123
```

The workflow dispatches agents, enforces review loops with convergence detection, handles the TDD cycle, and creates the draft PR. It never waits for user input.

### `/review` — Standalone Code & Plan Review ([review.md](#file-review-md))

An independent review orchestrator that dispatches `@check` and `@simplify` in parallel against any artifact. This is useful outside `/workflow` — for reviewing your own changes, a teammate's PR, or a plan before committing to implementation.

```
/review              # Review uncommitted changes
/review a1b2c3d      # Review a specific commit
/review feature-x    # Review a branch diff against HEAD
/review 42           # Review PR #42
/review @plan.md     # Review a plan/architecture doc
```

**Key design choices:**
- Auto-detects input type: uncommitted changes, commit hash, branch name, PR number/URL, or plan file.
- For code reviews: reads full file contents (not just diffs) so reviewers have complete context.
- For plan reviews: uses the explore agent to find related existing code, giving reviewers implementation context.
- Presents both reviewers' outputs in their native scales — `@check` uses risk severity (BLOCK/HIGH/MEDIUM/LOW), `@simplify` uses payoff/effort. No normalization across agents.
- The gate verdict (merge/no-merge decision) comes from `@check` only. Simplification recommendations are advisory.

## Trust Model

The workflow enforces separation of concerns through tool access:

| Agent | Can read code | Can write code | Can run commands | Can access external services |
|-------|:---:|:---:|:---:|:---:|
| `@check` | Yes | No | No | No |
| `@simplify` | Yes | No | No | No |
| `@test` | Yes | Test files only | Sandboxed | No |
| `@make` | Yes | Yes | Sandboxed | No |
| `@pm` | Yes | No | No | Linear only |

**Why this matters:**
- Reviewers can't accidentally modify what they're reviewing
- The test author can't modify production code — enforced by file pattern matching and a post-step gate
- The implementor can't do git operations or install packages — the orchestrator handles that
- The PM agent can't touch code — it only manages issues
- `@test` and `@make` share the same bash sandbox: test runners and read-only inspection only

## The TDD Loop

The workflow uses test-driven development by default. Here's the flow:

```
Plan → @test writes failing tests → @make implements to green
         ↓                              ↓
    Failure classified:            Entry validation:
    MISSING_BEHAVIOR ✓             Verify RED matches handoff
    ASSERTION_MISMATCH ✓           If tests pass → STOP (anomaly)
    TEST_BROKEN → fix first        If wrong failure → escalate
    ENV_BROKEN → BLOCKED
```

### Decision Table

| Condition | Action |
|-----------|--------|
| Task changes public API, fixes bug, adds business logic | `@test` writes tests first. `@make` runs in TDD mode. |
| Task is config-only, decorator swap, import reorg, docs | `@test` may return NOT_TESTABLE. `@make` runs standard mode. |
| `@test` returns TESTS_READY + no escalation | Proceed directly to `@make`. |
| `@test` returns TESTS_READY + escalation flag | Route tests to `@check` for light review first. |
| `@test` returns NOT_TESTABLE | Route to `@check` for sign-off, then `@make` standard mode. |
| `@test` returns BLOCKED | Investigate. Revise task spec or fix environment. |
| `@make` flags test quality concern | Caller → `@check` (diagnose) → `@test` (fix) → back to `@make`. |

### Escalation Chain

When `@make` encounters a test problem during TDD:

1. `@make` diagnoses the issue and reports to the caller (orchestrator)
2. Caller routes to `@check` for independent diagnosis
3. `@check` reports findings (the test is wrong vs. the spec is wrong vs. the implementation approach needs rethinking)
4. Caller routes to `@test` for fixes
5. Fixed tests return to `@make`

This keeps each agent in its lane: `@make` never edits test files, `@test` never edits production code, and `@check` never edits anything.

### File Gate Enforcement

The orchestrator enforces a post-step file gate after `@test` runs. It snapshots changed files before and after, and validates that `@test` only created files matching test patterns (`**/test_*.py`, `**/*_test.py`, `**/conftest.py`). Any violation causes `@test`'s output to be discarded. This is defense-in-depth on top of the agent's own file constraint.

## Configuration

See [opencode-config.example.json](#file-opencode-config-example-json) for the relevant config. Key pieces:

**CLI access gating** — deny the `linear` CLI globally, allow it only in `@pm`'s agent sandbox:
```json
// Global config: deny linear CLI for main agent
"permission": {
  "bash": {
    "linear *": "deny"
  }
}
```
```yaml
# pm.md frontmatter: allow linear CLI for @pm only
permission:
  bash:
    "*": deny
    "linear *": allow
    "linear issue delete*": deny
```

**Bash permissions** — global safety rails plus per-agent sandboxes. See the config example for the full setup.

## How to Adopt

1. **Install [OpenCode](https://opencode.ai)** if you haven't
2. **Copy agent files** to `~/.config/opencode/agents/`:
   - `check.md`, `simplify.md`, `test.md`, `make.md`, `pm.md`
3. **Copy commands:**
   - `/workflow` → your project's `.opencode/commands/workflow.md`
   - `/review` → `~/.config/opencode/commands/review.md` (global, works across projects)
4. **Install the Linear CLI** — `brew install schpet/tap/linear` and run `linear auth` ([github.com/schpet/linear-cli](https://github.com/schpet/linear-cli))
5. **Set up permissions** — copy the `permission` section from the config example (globally denies `linear *` so only `@pm` can use it)
6. **Add the system prompt sections** — see [agents-system-prompt.md](#file-agents-system-prompt-md) for the relevant `AGENTS.md` sections that give the main agent context about the workflow
7. **Customize paths** — update repo root, branch naming, and team references in `workflow.md`

## Customization Points

| What | Where | Notes |
|------|-------|-------|
| PM tool | `pm.md` + bash permissions | Swap Linear CLI for another CLI (e.g., `gh` for GitHub Issues, `jira-cli`) |
| Models | Agent frontmatter (`model:`) | Change per-agent models to what you have access to |
| Review cycles | `workflow.md` phases 5 & 9 | Default is max 3; reduce for speed, increase for rigor |
| Branch naming | `workflow.md` phase 2 | Currently `<user>/<issue-id>-<slug>` |
| Git strategy | `workflow.md` phase 3 | Uses bare clone + worktrees; adapt to your git workflow |
| Bash sandbox | `make.md` / `test.md` frontmatter | Adjust allowed commands for your toolchain (npm, cargo, etc.) |
| Test runner | `make.md` verification tiers, `test.md` | Currently `uv run pytest`; change to your test command |
| Test file patterns | `test.md` file constraint, `workflow.md` file gate | Currently `**/test_*.py`; adjust for your naming convention |
| Review scope | `review.md` step 2 | Add project-specific convention files to check |

## Lessons Learned

**What works well:**
- **Read-only reviewers** prevent the "reviewer who also fixes things" antipattern. Forces clean separation.
- **Fresh context per task** for `@make` prevents context pollution between tasks. Each implementation starts clean.
- **Convergence detection** in review loops (same findings twice = stop early) prevents wasted cycles.
- **Fire-and-forget with notifications** is the right UX. The workflow is too long for synchronous watching.
- **Test-first catches spec ambiguity early.** When `@test` can't write a clear assertion, the acceptance criteria are vague. This surfaces before implementation starts, not after.
- **Structured failure classification** (MISSING_BEHAVIOR vs. TEST_BROKEN) prevents false RED. Without it, a typo in a test file looks the same as a genuinely missing function.
- **Standalone `/review`** sees heavy ad-hoc use. Most reviews don't need the full workflow — just `@check` + `@simplify` with fresh eyes on a diff.

**What we'd improve:**
- **Task dependencies** aren't formally modeled. If task 3 depends on task 2's output, the sequential execution handles it, but there's no explicit dependency graph.
- **Rollback on failure** is minimal — it commits WIP and creates a draft PR, but doesn't clean up the worktree.
- **Model diversity** for reviewers helps (different model = different blind spots), but makes the setup harder to share since not everyone has the same model access.
- **Test parallelism** is limited by conftest.py collision risk. `@test` is forbidden from modifying existing conftest files, but creating new ones in the same directory across parallel tasks could still conflict.

## Using Agents & Commands Standalone

The agents and commands are independently useful outside the `/workflow` command:

**Agents:**
- **`@check`** — Review any PR, architecture doc, or config change: `@check review this PR: <paste diff>`
- **`@simplify`** — Gut-check complexity on any code you're writing or reviewing
- **`@test`** — Write tests for a task spec before implementing it yourself: `@test <paste task with acceptance criteria>`
- **`@make`** — Hand off a well-defined task when you want implementation without losing your current context
- **`@pm`** — Query Linear without leaving your terminal: `@pm what are the open issues for the AI team?`

**Commands:**
- **`/review`** — Review uncommitted changes, a commit, a branch, a PR, or a plan doc. Dispatches both reviewers with one command.
- **`/workflow`** — Full autonomous pipeline from Linear issue to draft PR.

The workflow is just one way to compose them. The real value is having purpose-built agents with constrained tool access that you can invoke ad-hoc.

## File Index

| File | What it is |
|------|-----------|
| [workflow.md](#file-workflow-md) | `/workflow` slash command — the orchestrator |
| [review.md](#file-review-md) | `/review` slash command — standalone review orchestrator |
| [check.md](#file-check-md) | `@check` agent — design reviewer |
| [simplify.md](#file-simplify-md) | `@simplify` agent — complexity reviewer |
| [test.md](#file-test-md) | `@test` agent — TDD test author |
| [make.md](#file-make-md) | `@make` agent — task implementor |
| [pm.md](#file-pm-md) | `@pm` agent — Linear integration |
| [multi-agent-workflow.md](#file-multi-agent-workflow-md) | Task splitting spec, decision table, and integration contracts |
| [opencode-config.example.json](#file-opencode-config-example-json) | Sanitized config snippets |
| [agents-system-prompt.md](#file-agents-system-prompt-md) | Relevant AGENTS.md sections for main agent context |