Anthropic’s Claude Opus 4.8 is its most honest AI model yet, and Mythos is coming in weeks

wpnews.pro

TL;DR

Anthropic has released Claude Opus 4.8, an upgrade to its flagship AI model that is four times less likely to let code flaws pass unremarked. The company also teased Mythos-class models, which have already found more than 10,000 critical software vulnerabilities through Project Glasswing, and announced a $65 billion Series H round at a $965 billion post-money valuation.

Anthropic has released Claude Opus 4.8, an upgrade to its flagship AI model that the company says is more honest, more reliable in agentic tasks, and better at catching its own mistakes. The model is available immediately at the same price as its predecessor, $5 per million input tokens and $25 per million output tokens, and is rolling out across all Anthropic products including claude.ai, Claude Code, and the API.

The headline improvement is honesty. Anthropic says Opus 4.8 is around four times less likely than Opus 4.7 to let flaws in code it has written pass unremarked. Early testers report the model is more willing to flag uncertainties about its work and less likely to make unsupported claims, a persistent problem across AI models that tend to project confidence regardless of whether it is warranted.

Benchmark gains across the board #

Opus 4.8 improves on its predecessor across Anthropic’s published benchmarks. On agentic coding (Terminal-Bench 2.1), the score rises from 64.3% to 69.2%. Multidisciplinary reasoning with tools improves from 54.7% to 57.9%. Agentic computer use moves from 82.8% to 83.4%, and knowledge work scores rise from 1,753 to 1,890.

Anthropic’s alignment assessment found that Opus 4.8 reaches new highs on measures of prosocial traits, including supporting user autonomy and acting in the user’s best interest. Rates of misaligned behaviour such as deception or cooperation with misuse are substantially lower than in Opus 4.7, and comparable to Claude Mythos Preview, Anthropic’s best-aligned model.

Early testers see practical gains #

The release is accompanied by endorsements from companies already using the model. Cognition, the company behind the AI coding agent Devin, said Opus 4.8 uses tools cleanly and fixes comment-verbosity and tool-calling issues that appeared in Opus 4.7. Cursor, the AI-powered code editor, reported improvements across every effort level on its CursorBench evaluation.

Harvey, which builds AI for legal work, said Opus 4.8 delivers the highest score recorded on its Legal Agent Benchmark and is the first model to break 10% overall on the all-pass standard. Databricks reported that Opus 4.8 handles deeper multistep questions faster in its Genie AI agent, at 61% cheaper token cost than Opus 4.7.

Thomson Reuters said CoCounsel Legal saw meaningful improvements in consistency and reasoning quality. Hebbia, which builds AI for financial document analysis, noted better citation precision and more token efficiency on retrieval tasks.

New features alongside the model #

Anthropic is launching several features alongside Opus 4.8. A new effort control in claude.ai and Cowork lets users choose how much computation Claude applies to a response, trading speed against quality. Claude Code gains a dynamic workflows feature that allows it to plan work and run hundreds of parallel subagents in a single session, enabling codebase-scale migrations across hundreds of thousands of lines of code.

For developers, the Messages API now accepts system entries inside the messages array, allowing instructions to be updated mid-task without breaking the prompt cache. Fast mode for Opus 4.8, which runs at 2.5 times the speed, is now three times cheaper than it was for previous models.

Mythos is the bigger story #

The more significant announcement may be what comes next. Anthropic said it plans to release a new class of model with higher intelligence than Opus, based on the Claude Mythos architecture. A small number of organisations are already using Claude Mythos Preview through Project Glasswing, an initiative focused on using the model for cybersecurity work. Anthropic and roughly 50 partners, including Apple, Google, Microsoft, and Amazon Web Services, have used Mythos Preview to find more than 10,000 high- or critical-severity vulnerabilities across critical software infrastructure.

Mythos-class models require stronger cyber safeguards before general release, Anthropic said, but the company expects to bring them to all customers in the coming weeks. The model sits a full capability tier above Opus 4.7 and can autonomously find zero-day vulnerabilities and create exploits for them, which explains both the excitement and the caution around its deployment.

A company approaching $1 trillion #

The Opus 4.8 launch arrives as Anthropic’s valuation continues to climb. The company announced a $65 billion Series H round at a $965 billion post-money valuation on the same day, up from the $380 billion valuation at which it closed its $30 billion Series G in February. Revenue has grown from roughly $1 billion at the end of 2024 to an estimated $30 billion annualised run rate in 2026, driven by enterprise adoption of Claude.

Anthropic also opened a new office in Milan on 28 May, its sixth in Europe, and appointed KiYoung Choi as Representative Director of Korea ahead of a Seoul office opening. The expansion reflects growing demand for Claude in enterprise markets outside the United States.

The competitive context #

Opus 4.8 enters a market where the pace of model releases has accelerated sharply. OpenAI launched GPT-5.5 as its first fully retrained base model since GPT-4.5, and GPT-5.4 set new records on professional benchmarks earlier this year. Google has invested up to $40 billion in Anthropic but continues to develop its own Gemini models. The frontier AI market has consolidated into a three-way race between Anthropic, OpenAI, and Google, with each company releasing incremental model upgrades at an increasing pace.

For Anthropic, the distinction it is trying to draw with Opus 4.8 is not raw capability but reliability. A model that catches its own mistakes, flags its uncertainties, and follows instructions consistently is more useful in agentic workflows where AI systems operate with limited human oversight. Whether that positioning holds as Mythos-class models arrive, promising higher intelligence with new safety constraints, will determine whether Anthropic can maintain its lead in the enterprise market it has worked to dominate.

source & further reading

thenextweb.com — original article AI has triggered the biggest gas-plant building boom in history, and a quiet fight to stop it Zhipu’s founder says frontier AI should stay open to everyone. His own government may disagree. Meta spent a year being punished for its AI spending. Then it told investors how it would get the money back.