Claude Opus 4.8: "a modest but tangible improvement"

wpnews.pro

cd /news/artificial-intelligence/claude-opus-4-8-a-modest-but-tangibl… · home › topics › artificial-intelligence › article

[ARTICLE · art-17009] src=simonwillison.net ↗ pub=2026-05-28T23:59Z topic=artificial-intelligence verified=true sentiment=↑ positive

Claude Opus 4.8: "a modest but tangible improvement"

Anthropic released Claude Opus 4.8, describing it as a "modest but tangible improvement" over its predecessor with a focus on increased honesty and reduced factual hallucinations. The model is four times less likely to allow flaws in code to pass unremarked and achieved the lowest incorrect rate on benchmarks by abstaining from uncertain questions. The release also introduces mid-conversation system messages and a lower prompt cache minimum of 1,024 tokens, while maintaining the same pricing and context window as previous versions.

read3 min views12 publishedMay 28, 2026

Anthropic shipped Claude Opus 4.8 today. My favourite thing about it is this note in the release announcement: Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost.

It's so refreshing to see an AI lab honestly describe a release as a minor incremental improvement over the previous model!

Honesty seems to be a theme. Here's my other favorite note from that announcement:

One of the most prominent improvements in Opus 4.8 is its

honesty. We train all our models to be honest---for instance, to avoid making claims that they can't support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in[our evaluations], which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.

That linked system card includes the following:

Claude Opus 4.8 had the lowest incorrect-rate of the six models on every benchmark—the most direct measure of factual hallucination. It achieved this mainly by abstaining on questions about which it was uncertain rather than by answering more questions correctly.

Not much has changed since 4.7.

It's priced the same as Opus 4.5/4.6/4.7 - $5/million input and $25 per million output. "Fast mode" is twice that price, which is a significant reduction from their previous models - fast mode on 4.6/4.7 remains at $30/$150. Note that fast mode is only available to organizations that are part of the research preview, "Contact your account manager to request access".

Both the reliable knowledge cutoff and the training data cutoff are January 2026, the same as for 4.7.

The context window is still 1,000,000 tokens, and the max output is 128,000 tokens.

The What's new in Claude Opus 4.8 document has some of the more interesting details. These caught my eye: Mid-conversation system messages. Claude Opus 4.8 acceptsrole: "system"

messages immediately after a user turn in themessages

array (subject to[placement rules]). This lets you append updated instructions later in a long-running conversation without restating the full system prompt, which preserves[prompt cache]hits on the earlier turns and reduces input cost on agentic loops.

See also this update to the Anthropic Python SDK. Being able to steer the system prompt mid-conversation sounds really powerful. I was worried this would be incompatible with the abstraction provided by my own LLM library, which expects a single system prompt per conversation... but it turns out my recent redesign should handle that just fine.

Lower prompt cache minimum. The minimum cacheable prompt length on Claude Opus 4.8 is 1,024 tokens, lower than on Claude Opus 4.7.

I checked and 4.7's minimum was 4,096. Here are pelicans riding bicycles for all five thinking levels, low

, medium

, high

, xhigh

, and max

This time I ran them using the LLM CLI, exported the logs to Markdown and then had Claude Opus 4.8 build me an HTML tool that could render that Markdown with the svg

fenced code blocks displayed as SVGs on the page.

This is the max one - it's clearly the best, but it did take 25 input, 17,167 output tokens for a total cost of [43 cents](https://www.llm-prices.com/#it=25&ot=17167&ic=5&oc=25&sel=claude-opus-4-5)!

Tags: [ai](https://simonwillison.net/tags/ai), [generative-ai](https://simonwillison.net/tags/generative-ai), [llms](https://simonwillison.net/tags/llms), [anthropic](https://simonwillison.net/tags/anthropic), [claude](https://simonwillison.net/tags/claude), [pelican-riding-a-bicycle](https://simonwillison.net/tags/pelican-riding-a-bicycle), [llm-release](https://simonwillison.net/tags/llm-release)

source & further reading

simonwillison.net — original article datasette code-frequency chart on GitHub Directly Responsible Individuals (DRI) Fable gets another bump

~/api · this article 200

$curl api.wpnews.pro/v1/news/claude-opus-4-8-a-modest…

Read original on simonwillison.net → simonwillison.net/2026/May/28/claude-opus-4-8/#a…

mentioned entities

Anthropic

Claude Opus 4.8

metadata

slugclaude-opus-4-8-a-modest-but-tangible-improvement

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalsimonwillison.net

navigation

← prevArcis – open-source security mid…

next →News Summary for May 29, 2026

── more in #artificial-intelligence 4 stories · sorted by recency

sourcefeed.dev · 14 Jul · #artificial-intelligence

Microsoft's CLI Agents: Social Spread, Real Lift, Real Cost

machinebrief.com · 14 Jul · #artificial-intelligence

CRiT-QA Exposes the Flaws in Multi-Hop Reasoning Models

machinebrief.com · 14 Jul · #artificial-intelligence

Texas Hold'em Reveals Risks in AI's Decision-Making Style

dev.to · 14 Jul · #artificial-intelligence

The MCP Confused Deputy: Provenance Gaps, Instruction Injection, and DNS Rebinding in the Model Context Protocol

── more on @anthropic 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required