# I Ran Claude Code on Every New Claude Model. Here's What Actually Ships.

> Source: <https://dev.to/suraj_khaitan_f893c243958/i-ran-claude-code-on-every-new-claude-model-heres-what-actually-ships-1j6l>
> Published: 2026-06-20 06:04:59+00:00

*Fable, Mythos, Opus 4.8, Sonnet 4.6, Haiku — Anthropic's 2026 lineup is no longer "one model you talk to." It's a fleet you route between. I spent a month inside Claude Code orchestrating all of them across real codebases. Here's which model to reach for, when, and the routing playbook that quietly doubled my throughput.*

Last time I wrote about Claude **Skills** and called Claude Code the killer host for them. Since then, two things happened that changed how I work day to day.

First, the **models got genuinely strange-good**. In the span of a few months Anthropic shipped Sonnet 4.6, Opus 4.8, and then an entirely new *tier* above Opus — the Mythos class — released to the public as **Claude Fable 5**. We went from "the AI suggested a decent diff" to Stripe reporting that Fable 5 ran a codebase-wide migration on a **50-million-line Ruby codebase in a single day** — work that would've taken a team over two months by hand.

Second, Claude Code stopped being a single-model tool. With a fleet of models at different price/speed/intelligence points, the highest-leverage skill in 2026 isn't prompting — it's **routing**. Knowing which model to put on which task is the difference between burning $200 of tokens on a typo fix and one-shotting a multi-service refactor.

So I did the obvious thing: I wired all of them into Claude Code and ran them against real work for a month — bug fixes, migrations, greenfield features, test suites, the boring stuff and the scary stuff. This is what I learned.

Forget "Claude" as one thing. In 2026 it's a graded ladder, and each rung exists for a reason.

| Model | Class | Sweet spot | Price (in / out per M tokens) |
|---|---|---|---|
Haiku |
Fast tier | High-volume, latency-sensitive, cheap glue work | Lowest |
Sonnet 4.6 |
Workhorse | Everyday coding, agents, 1M context | $3 / $15 |
Opus 4.8 |
Heavy lifter | Architecture, refactors, judgment-heavy work | $5 / $25 ($10 / $50 fast mode) |
Fable 5 |
Mythos-class (safe) | Long-horizon, frontier coding, vision, research | $10 / $50 |
Mythos 5 |
Mythos-class (restricted) | Cyber defense, life sciences — vetted access only | $10 / $50 |

A few things worth knowing about how these actually relate:

Here's the mental model I settled on after a month. Think of it as a triage flow:

``` php
flowchart TD
    A[New task] --> B{How long-horizon<br/>and how risky?}
    B -->|Quick edit, glue,<br/>bulk text| H[Haiku]
    B -->|Everyday coding,<br/>most PRs| S[Sonnet 4.6]
    B -->|Architecture, refactor,<br/>needs judgment| O[Opus 4.8]
    B -->|Multi-hour migration,<br/>frontier reasoning| F[Fable 5]
    O -->|Scale it out| D[Dynamic workflows:<br/>100s of subagents]
```

**1. Start at Sonnet 4.6. Always.**

This is the single most important habit. Sonnet 4.6 now benchmarks near Opus-level on the coding tasks most teams actually care about, with a 1M-token context window and a price point that makes running multiple instances in parallel economically trivial. Several teams I trust have publicly moved the *majority* of their traffic here. Start here, and only climb the ladder when Sonnet visibly struggles.

**2. Climb to Opus 4.8 when judgment matters.**

The moment a task needs *taste* — a cross-service refactor, an API redesign, "should we even do it this way?" — Opus 4.8 earns its premium. The standout improvement isn't raw smarts, it's **honesty**: Opus 4.8 is roughly **four times less likely than its predecessor to let a flaw in its own code pass unremarked**. It flags uncertainty instead of confidently shipping a landmine. For unattended, long-running work, that's worth more than a benchmark point.

**3. Reach for Fable 5 on the long-horizon stuff.**

When the task is genuinely big — a migration across hundreds of thousands of lines, rebuilding an app's source from screenshots, reasoning that spans millions of tokens — Fable 5 is the one I reach for to get past a wall. It stays focused across enormous contexts and improves its own outputs using file-based memory. It's also more **token-efficient** than past models, which softens the higher per-token price.

**4. Drop to Haiku for the boring glue.**

Bulk renames, log parsing, commit-message generation, simple codegen. Don't pay Opus prices to reformat JSON.

A model fleet only pays off if the host lets you orchestrate it. Four features did the heavy lifting for me:

Launched alongside Opus 4.8, **dynamic workflows** let Claude plan a task and then fan out across **tens to hundreds of parallel subagents** in a single session — *then verify its own outputs before reporting back*. This is what turns "codebase-scale migration" from a slide into a Tuesday. Claude Code with Opus 4.8 can now take a six-figure-line migration from kickoff to merge, using your existing test suite as the bar. Available on Enterprise, Team, and Max plans.

**Routines** (shipped April 2026) let you configure a Claude Code workflow once and trigger it on a **schedule, via API, or in response to an event**. Nightly dependency upgrades, auto-triage of new GitHub issues, on-merge changelog generation. Pair a routine with the right model — Sonnet for triage, Opus for the actual fix — and you've replaced a pile of brittle CI scripts with one agent that improves over time.

When you're keeping "as many instances of Claude Code busy as possible" (Notion's co-founder isn't joking — that's literally the workflow now), you need a cockpit. **Agent View** gives you one place to manage every running session across surfaces. It's the unglamorous feature that makes parallel agent work *sane*.

Claude Code now **opens your apps, drives your browser, and runs your dev tools** to complete tasks end-to-end. Combined with Fable 5's state-of-the-art vision (it beat Pokémon FireRed from raw screenshots alone, no harness), the "AI that can actually operate your machine" future is quietly here.

And it meets you everywhere: **terminal, VS Code / Cursor / JetBrains extensions, desktop app, web, mobile, and Slack** — same agent, same context, same models, wherever you happen to be working.

The newer models expose an **effort control** — and it's the cheapest performance lever you have. Opus 4.8 defaults to *high*, but you can push it to *extra* (`xhigh`

in Claude Code) or *max* for hard problems and long async runs. On lower effort it answers faster and sips your rate limits; on higher effort it thinks more and self-validates.

My rule: **low/standard effort for interactive back-and-forth, high/extra for anything you're going to walk away from.** The extra thinking pays for itself precisely when you're not watching.

There's also **fast mode** for Opus 4.8 — 2.5× the speed at a higher per-token cost. Great for tight interactive loops where you're paying in wall-clock attention, not just dollars.

Routing doesn't have to stop at Claude's borders. A few honest observations from running mixed fleets:

The takeaway isn't "Claude beats everyone." It's that **multi-model routing is now a first-class engineering decision**, and Claude Code is the most mature place to actually do it.

Benchmarks are fine. But what convinced me — and what I think convinces most engineers — is watching the thing land a PR you'd have spent a day on. Here are the use cases I ran (and the public results that back them up), organized by the kind of work you actually do.

**The task:** Migrate a large service off a deprecated framework — the kind of ticket that sits in the backlog for two quarters because nobody has a free week.

**The setup:** Opus 4.8 (or Fable 5 where available) + **dynamic workflows**, with the existing test suite as the pass/fail bar. Claude plans the migration, fans out across hundreds of parallel subagents, each handling a slice, then verifies against the tests before reporting back.

**The result:** Stripe reported Fable 5 performing a **codebase-wide migration on a 50-million-line Ruby codebase in a single day** — work estimated at **two-plus months** for a team by hand. In my own (far smaller) runs, a multi-thousand-file framework bump that I'd scoped at three days came back green in an afternoon, with a clean diff and a summary of every non-trivial decision.

Takeaway:Long-horizon migrations are the single highest-ROI use case for the frontier tier. The longer and more mechanical the migration, the more absurd the time savings.

**The task:** Turn an exploratory notebook (pull data, train a model, eval with basic metrics) into a real, scheduled production pipeline.

**The setup:** Sonnet 4.6 as the driver — this is bread-and-butter work that doesn't need Opus. Point it at the notebook and your pipeline framework's conventions in `CLAUDE.md`

.

**The result:** Ramp's staff engineer reported this exact workflow — notebook to Metaflow pipeline — **saving 1–2 days of routine work per model.** That's not a demo; that's a recurring tax on every ML engineer's week, quietly removed.

Takeaway:The boring-but-skilled translation work (notebook→pipeline, script→service, prototype→prod) is where Sonnet 4.6 pays for itself daily.

**The task:** A GitHub issue comes in. Read it, reproduce, write the fix, add a test, open the PR.

**The setup:** Claude Code's GitHub/GitLab integration. Sonnet 4.6 for triage and the common case; escalate to Opus 4.8 when the bug touches architecture or the root cause is non-obvious.

**The result:** This is the loop teams at GitHub, Cognition, and Code Rabbit have publicly leaned into — Sonnet 4.6 "punches way above its weight class for the vast majority of real-world PRs," with double-digit-point gains on the *hardest* bug-finding problems over Sonnet 4.5. In practice: most issues never reach me as anything but a PR to review.

Takeaway:Wire the cheap model to the front door, reserve the expensive model for the hard 10%. Don't pay Opus to fix a null check.

**The task:** "Here's a screenshot of the dashboard. Rebuild it." No source, no spec — just pixels.

**The setup:** Fable 5, the current state-of-the-art vision model. It can extract precise numbers from scientific figures and **reconstruct a web app's source code from screenshots alone**.

**The result:** Anthropic's own demo had Fable 5 beating Pokémon FireRed from raw game screenshots with a *vision-only* harness — something earlier Claude models couldn't do even *with* navigation aids. Translated to dev work: design-to-code from a Figma export or a competitor's UI screenshot, with far less hand-holding than anything before it.

Takeaway:Vision is no longer a party trick. "Rebuild this from a picture" is a real, reliable workflow now.

**The task:** Dependency upgrades, flaky-test triage, changelog generation — the chores that rot a codebase when ignored.

**The setup:** **Routines.** Configure once, trigger on a schedule. Sonnet 4.6 does the nightly sweep; anything genuinely broken gets escalated to an Opus 4.8 fix with a draft PR waiting in the morning.

**The result:** Replaced a folder of brittle cron + bash scripts with a single agent that *understands* why a test failed instead of just reporting that it did. The win isn't speed — it's that the maintenance actually happens now, every night, without a human remembering to do it.

Takeaway:Skills + Routines + model routing is the combo that turns "we should automate that" into "it ran at 2am."

**The task:** Catch the confidently-wrong bug before it ships.

**The setup:** Primary model writes the diff; a *different* model (via MCP — could be another Claude tier, GPT-5.5, or Gemini 3.5) reviews it adversarially. Opus 4.8's honesty gains help here too: it's ~**4× less likely than its predecessor to let a flaw in its own code pass unremarked.**

**The result:** Cognition reported Sonnet 4.6 "meaningfully closed the gap with Opus on bug detection," letting them run **more reviewers in parallel** and catch a wider variety of bugs *without increasing cost*. A second, independent model catches the class of mistakes self-review structurally can't.

Takeaway:Two cheap reviewers beat one expensive author. Parallel, multi-model review is now economically obvious.

| Use case | Model(s) | Reported / observed result |
|---|---|---|
| 50M-line framework migration | Fable 5 + dynamic workflows | ~2 months → 1 day (Stripe) |
| Notebook → prod pipeline | Sonnet 4.6 |
1–2 days saved per model (Ramp) |
| Issue → PR | Sonnet 4.6 → Opus 4.8 | Most issues arrive as review-ready PRs |
| Screenshot → app | Fable 5 (vision) | Source rebuilt from pixels alone |
| Nightly maintenance | Sonnet 4.6 + Routines | Chores that actually happen, unattended |
| Adversarial review | Multi-model via MCP | More bugs caught, parallel, no cost increase |

The pattern across all six: **match the model to the shape of the task, let Claude Code orchestrate, and verify with tests or a second model.** That's the whole game.

A few hard-won habits that separated my good weeks from my great ones:

`CLAUDE.md`

, once.`legacy/`

." Every model in the fleet inherits it. This single file is the highest-leverage 20 minutes you'll spend.The meta-lesson: **agentic coding rewards engineers who think like tech leads.** You decide *what* and *why*; the fleet handles *how*. The bottleneck moved from typing speed to judgment — which is exactly where you want it.

The Mythos class crossed a capability threshold that made Anthropic genuinely nervous — and they were right to be. These models excel at discovering and exploiting software vulnerabilities and at agentic hacking (recon, lateral movement, the works). That's exactly why:

For your own work, the same discipline as ever applies: **sandbox agent execution, restrict file-system and network egress, review diffs before they merge, and never let an autonomous agent push to anything you can't roll back.** A more capable model raises the stakes of a bad instruction, not just a good one.

**Install Claude Code** (one-liner):

```
irm https://claude.ai/install.ps1 | iex          # Windows
# or: curl -fsSL https://claude.ai/install.sh | sh   # macOS / Linux
```

**Pick your plan.** Claude Code is bundled into Pro ($17–$20/mo), Max 5x ($100/mo), and Max 20x ($200/mo). For "keep three branches alive while I review the fourth," Max is the honest entry point.

**Switch models per task.** Inside a session, select the model that matches the job — Sonnet for the PR, Opus for the architecture call, Fable for the migration (where available). Use a `CLAUDE.md`

file to encode your project's conventions once so every model inherits them.

**Promote winners to Routines.** Once a model-plus-workflow combo proves itself, schedule it. Nightly Sonnet-powered issue triage that escalates real bugs to an Opus fix is the kind of thing that runs while you sleep.

**Wire in a second opinion via MCP.** Let a different model adversarially review high-stakes diffs. Cheap insurance against confident-but-wrong.

A year ago the question was "is the AI good enough to write this code?" In 2026 the answer is *yes* — across an entire ladder of models, each tuned for a different shape of problem. The new skill, the one that separates a 1.2× productivity bump from a 3× one, is **knowing which model to put on which task** and letting Claude Code orchestrate the fleet.

Start at Sonnet 4.6. Climb to Opus 4.8 when judgment matters. Reach for Fable 5 on the long-horizon work — when you can get it. Wire in a second model for adversarial review. Promote your wins to Routines. And keep a fallback path for the frontier models, because as June 2026 reminded everyone, the most capable model is also the one most likely to get pulled out from under you for a week.

Tools give agents capability. Skills give them competence. Models give them *intelligence at the right price* — and Claude Code, in 2026, is where you conduct the whole orchestra.

**Suraj Khaitan** — Gen AI Architect | Building scalable platforms and secure cloud-native systems

Connect on [LinkedIn](https://www.linkedin.com/in/suraj-khaitan-501736a2/) | Follow for more engineering and architecture write-ups

*Which Claude model has become your default — and what finally made you climb the ladder? Drop it in the comments. I'm always refining the routing playbook.*

Sources & further reading:Anthropic's announcements for[Claude Fable 5 & Mythos 5],[Claude Opus 4.8],[Claude Sonnet 4.6], the[Claude Code product page], and the[Fable/Mythos access statement]. Benchmarks and pricing reflect Anthropic's published figures as of June 2026 and are subject to change.