Dynamic Workflows in Opus 4.8: Build a Self-Verifying PR Reviewer

wpnews.pro

Most people use Opus 4.8 the way they used every model before it: open a chat, type a request, watch the cursor, correct it, repeat. That's a conversation. A dynamic workflow is something else entirely.

The shift is this: you stop being the loop. Instead, an orchestrator — plain code you control — spawns subagents you design, fanning out work in parallel, running steps in sequence, judging and merging results, and reporting back when the whole thing is done. Opus 4.8 can drive hundreds of parallel subagents inside a single workflow, with effort control per node so cheap steps stay cheap and hard steps think harder.

In this tutorial you'll learn the core patterns by building one concrete thing: a pull-request reviewer that fans out across correctness, security, and performance, then adversarially verifies every finding before it reaches you.

// You design the shape. The orchestrator runs it.
const found    = await parallel(DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS })))
const deduped  = dedupeByFileLine(found.flatMap(r => r.findings))
const verified = await parallel(deduped.map(f => () => agent(refutePrompt(f), { schema: VERDICT })))
const real     = verified.filter(v => v.refuted === false)

By the end you'll know when to reach for parallel()

versus pipeline()

, how structured output schemas keep subagents composable, and where to set effort per node.

Stop thinking "I send a prompt, I get a completion." Start thinking: an orchestrator runs a workflow graph, and each node is an agent call. The orchestrator is plain code. It decides what runs, in what order, and what to do with each result. Subagents are the leaf workers — each gets a focused prompt, a structured-output schema, and its own effort setting. The unit of work is no longer the prompt; it's the graph.

Two primitives compose every graph, and the difference between them is entirely about barriers — when the orchestrator blocks and waits.

parallel()

is a barrier parallel()

fans work out to many subagents at once and resolves only when all of them return. Nothing downstream runs until the slowest node finishes. Use it for independent work that must be fully collected before the next decision — one subagent per review dimension, N-way verification, hundreds of concurrent checks.

// FAN-OUT: dimensions are independent → run them together
const found = await parallel(
  DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS, effort: "medium" }))
)
// barrier: every dimension has returned before we continue
const deduped = dedupeByFileLine(found.flatMap(r => r.findings)) // plain code, no agent

Note the () =>

thunks. parallel()

invokes them itself — it schedules the work; it doesn't receive already-started promises.

pipeline()

enforces order pipeline()

chains stages where stage N+1 depends on stage N's output. Each stage blocks until its input exists, so the stages run strictly in sequence and the latencies add up. Reach for it when there's a true data dependency — you can't synthesize a review before findings exist, and you can't verify findings before they're deduplicated.

const review = await pipeline(
  () => parallel(DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS }))),
  (found)   => dedupeByFileLine(found.flatMap(r => r.findings)),
  (deduped) => parallel(deduped.map(f => () => agent(refutePrompt(f), { schema: VERDICT }))),
)

Notice dedupeByFileLine

is not an agent — deterministic work stays in code. You only spend a subagent where judgment is required.

The whole grammar: parallel

for independence, pipeline

for dependency. Real workflows alternate between the two, fanning out for breadth and chaining where order matters.

Every agent()

call above passes a schema

. The model returns data shaped to that contract — FINDINGS

, VERDICT

, REVIEW

— so you index fields instead of regexing prose. This is what lets the dedup and filter steps be plain code rather than yet another LLM call:

const real = verified.filter(v => v.refuted === false)

Schemas are the seams that keep subagents composable. A node's output is machine-readable, so the next node — agent or code — consumes it without a parsing layer in between.

Most "AI code review" is one model, one prompt, one pass. It finds plausible bugs and reports them with equal confidence — including the ones that aren't real. Dynamic workflows let you do better: fan out across review dimensions in parallel, then make the model attack its own findings before reporting them. Here's the full pipeline.

Run one subagent per review dimension. They don't depend on each other, so they execute concurrently behind a barrier.

const DIMENSIONS = [
  { name: "correctness", prompt: correctnessPrompt(diff) },
  { name: "security",    prompt: securityPrompt(diff) },
  { name: "performance", prompt: perfPrompt(diff) },
];

const found = await parallel(
  DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS }))
);

Each agent()

call is an isolated subagent with its own context window — the security reviewer never sees the performance reviewer's noise. { schema: FINDINGS }

forces a structured output: an array of { file, line, severity, claim }

, not prose you have to regex later.

Three reviewers will flag the same line. Merging is deterministic set logic — don't spend a model on it.

const deduped = dedupeByFileLine(found.flatMap(r => r.findings));

flatMap

flattens the per-dimension arrays into one list; dedupeByFileLine

collapses entries sharing a (file, line)

key. Use code wherever the answer is mechanical. Agents are for judgment, not joins.

This is the step that kills false positives. For each surviving finding, spawn a skeptic subagent whose only job is to refute it.

const verified = await parallel(
  deduped.map(f => () => agent(refutePrompt(f), { schema: VERDICT }))
);
const real = verified.filter(v => v.refuted === false);

refutePrompt(f)

instructs the subagent: "Here is a claimed bug. Prove it's wrong — find the guard, the caller, the type that makes it safe." VERDICT

is { refuted: boolean, reason: string }

. A finding that survives a dedicated attacker is worth reporting; one that doesn't, isn't.

For higher-stakes findings, fan out N skeptics per finding and keep only what a majority can't refute — verification scales independently of review:

async function survivesQuorum(f, n = 3) {
  const verdicts = await parallel(
    Array.from({ length: n }, () => () => agent(refutePrompt(f), { schema: VERDICT }))
  );
  const refutals = verdicts.filter(v => v.refuted).length;
  return refutals <= Math.floor(n / 2); // a majority could not refute it
}

This is a judge pattern: refutation is adjudication, kept separate from the generation in step 1. Asking a model to merely re-summarize its own findings launders the weak ones into the report. Refutation is a sharper filter than agreement.

One agent turns confirmed findings into the review a human reads.

const review = await agent(synthesisPrompt(real), { schema: REVIEW });
js
const review = await pipeline(
  ()        => parallel(DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS }))),
  (found)   => dedupeByFileLine(found.flatMap(r => r.findings)),
  (deduped) => parallel(deduped.map(f => () => agent(refutePrompt(f), { schema: VERDICT }))),
  (verified, deduped) => synthesize(deduped, verified), // keep only refuted === false, then write
);

pipeline()

is sequential — each stage's output feeds the next. parallel()

is the barrier inside stages 1 and 3.

Not every node deserves the same compute. Set effort per call: skeptics run cheap because refutation is a narrow question; synthesis runs at high effort because it's the artifact a human trusts.

agent(refutePrompt(f),       { schema: VERDICT, effort: "low"  });
agent(synthesisPrompt(real), { schema: REVIEW,  effort: "high" });

You spend reasoning where judgment is hard and conserve it where the work is mechanical — and a human still approves the final review before anything posts.

parallel()

returns when the slowest node finishes; pipeline()

runs stages in sequence and accumulates their latency. Mismatching them is the most common cost mistake. Your review dimensions are independent, so fan them out — don't chain them.

// Good: 3 dimensions run concurrently, wall-time ≈ slowest dimension
const found = await parallel(DIMENSIONS.map(d => () => agent(d.prompt, { schema: FINDINGS })))

// Bad: same work, ~3x the latency for no reason
const found = await pipeline(
  () => agent(DIMENSIONS[0].prompt, { schema: FINDINGS }),
  () => agent(DIMENSIONS[1].prompt, { schema: FINDINGS }),
  () => agent(DIMENSIONS[2].prompt, { schema: FINDINGS }),
)

Reserve pipeline()

for true data dependencies — verify needs dedup's output, so that edge stays sequential.

Verification is the expensive phase: it can spawn N skeptics per finding. If correctness and security both flag auth.js:42

, verifying twice burns budget for nothing. Collapse duplicates first with plain code — no agent required.

The synthesize step is your human-in-the-loop checkpoint. Confirmed findings are a recommendation, not an auto-commit — a person approves before anything lands.

Fan-out multiplies whatever your base node produces, so the base node's reliability matters. Anthropic reports Opus 4.8 makes roughly 4x fewer silent code bugs than its predecessor; the more trustworthy each leaf reviewer is, the safer it is to run many of them in parallel.

A single agent is the right default. Reach for a dynamic workflow only when the task has structure you can name: independent dimensions that fan out in parallel, a verification step that must be adversarial rather than self-graded, or a synthesis pass that depends on confirmed inputs.

The PR-review example earns its workflow because each stage has a different shape — fan out, collapse in code, fan out again to refute, then synthesize. parallel()

is the barrier; pipeline()

enforces order; schemas keep the seams machine-readable; effort goes high on synthesis and low on the mechanical passes.

Open question: which of your "trust me" agent steps is actually an unverified claim waiting for a skeptic?

source & further reading

dev.to — original article Practical Guide: Integrating Claude Code with NanoBanana MCP for Image Generation and Editing Squeezing Every Megabyte: Optimizing an 8GB NVIDIA Jetson Orin Nano for Headless ROS 2 and Edge-AI "Is it alive?" is the wrong question. Ask "is it working?"

Dynamic Workflows in Opus 4.8: Build a Self-Verifying PR Reviewer

Run your AI side-project on zahid.host