Two frontier-class models just launched weeks apart β Anthropic's Claude Sonnet 5
(closed, $2/$10 per 1M launch pricing) and Z.AI's GLM-5.2 (open-weight, MIT, ~$1.40/ $4.40 across hosts) β and the first question everyone asks is "which is cheaper?"
The honest answer: it depends on your token mix, your tier, and whether cached
input matters. Here's a repeatable way to answer it for your case, using live,
verified pricing.
Providers quote prices in incompatible units β per-1K, per-1M, sometimes per-image
or per-character β and split input, output, and cached-input. Before you can
compare anything, convert all of it to dollars per 1 million input tokens and
per 1 million output tokens. (This is the single biggest source of "wait, that's
cheaper than I thought" errors.)
Comparing a frontier flagship to a budget model on price alone is meaningless.
Bucket first, then compare within a bucket:
A summarizer is input-heavy; a code generator is output-heavy. Output usually
costs 3-5x input, so a model that looks cheap on input can lose on a
generation-heavy workload. Multiply each rate by your real volume β don't eyeball
the sticker price.
For RAG and agent loops you re-send the same context constantly. Cached-input pricing is often a huge discount β Sonnet 5's cache hits are 90% cheaper than
fresh input ($0.20 vs $2.00 /1M) β and it can flip the ranking entirely. If your
workload is cache-heavy, rank by cached-input price, not raw input. (There's a
[live ranking of caching-capable APIs](https://modelpricewatch.com/best-for/prompt-caching)
if you want the current order.)
Prices move β Sonnet 5's own launch pricing reverts from $2/$10 to $3/$15 on Sep 1,
https://modelpricewatch.com/api/v1/models.json
.Worked example: for a chat product doing ~2M input / 0.5M output tokens a day, run
those numbers through a cost calculator across your shortlist β and if you re-send
a big system prompt each call, add the cached-input rate. The difference between
Sonnet 5 with caching and a naive flagship default can be the majority of your bill.
Disclosure: I build and maintain Model Price Watch. The method above works with any pricing source β I just happen to keep one current.