cd /news/large-language-models/claude-fable-5-costs-6k-just-to-benc… · home topics large-language-models article
[ARTICLE · art-31885] src=cryptobriefing.com ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Claude Fable 5 costs $6K just to benchmark, highlighting the soaring price of frontier AI

Anthropic's Claude Fable 5 model cost $6,227.74 to benchmark on the Artificial Analysis Intelligence Index, consuming 87 million output tokens at $50 per million. The model scored 64.9, topping the index and surpassing OpenAI's GPT-5.5, highlighting the soaring costs of frontier AI evaluation.

read2 min views1 publishedJun 17, 2026

Anthropic's new flagship reasoning model tops the Artificial Analysis Intelligence Index but burns through 87 million output tokens in the process

Running a benchmark suite on Anthropic’s newest AI model now costs roughly what a used Honda Civic does. Claude Fable 5, the company’s latest flagship reasoning model, racked up a bill of $6,227.74 just to complete the Artificial Analysis Intelligence Index evaluations.

The model launched on June 9, 2026, and immediately claimed the top spot on the Intelligence Index with a score of 64.9. That dethroned its predecessor, Claude Opus 4.8, and left OpenAI’s GPT-5.5 in the rearview at 58.6.

The price of being the best #

Claude Fable 5 is priced at $10 per million input tokens and $50 per million output tokens. The benchmark evaluation consumed 87 million output tokens, which is why the final tab landed north of $6K.

Reasoning models produce far more output tokens than standard chat models because they work through problems step by step. That 87 million token output figure is a direct consequence of asking a reasoning model to grind through coding challenges, logic puzzles, and complex multi-step evaluations.

Anthropic does offer a 90% discount on prompt caching for repeated input hits, but that doesn’t help much when your primary cost driver is output generation.

What you get for the money #

On SWE-Bench Pro, a benchmark that tests a model’s ability to solve real-world software engineering problems, Fable 5 scored 80.3% accuracy. That’s a substantial improvement over Claude Opus 4.8’s 69.2% and ahead of GPT-5.5’s 58.6%.

The model also ships with a 1 million token context window and handles both image and text inputs.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our

Editorial Policy.

── more in #large-language-models 4 stories · sorted by recency
── more on @anthropic 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/claude-fable-5-costs…] indexed:0 read:2min 2026-06-17 ·