Sonnet 5 vs GLM-5.2 vs everyone: how to pick the cheapest LLM API in 2026

A developer compared the pricing of Anthropic's Claude Sonnet 5 and Z.AI's GLM-5.2, finding that the cheapest LLM API depends on token mix, tier, and caching. The developer recommends converting all pricing to dollars per 1 million tokens, bucketing models by capability, and factoring in cached-input costs, which can be 90% cheaper for Sonnet 5. A worked example shows that caching can flip the cost ranking for chat products with repeated context.

Two frontier-class models just launched weeks apart — Anthropic's Claude Sonnet 5 closed, $2/$10 per 1M launch pricing and Z.AI's GLM-5.2 open-weight, MIT, ~$1.40/ $4.40 across hosts — and the first question everyone asks is "which is cheaper?" The honest answer: it depends on your token mix, your tier, and whether cached input matters. Here's a repeatable way to answer it for your case, using live, verified pricing. Providers quote prices in incompatible units — per-1K, per-1M, sometimes per-image or per-character — and split input, output, and cached-input. Before you can compare anything, convert all of it to dollars per 1 million input tokens and per 1 million output tokens. This is the single biggest source of "wait, that's cheaper than I thought" errors. Comparing a frontier flagship to a budget model on price alone is meaningless. Bucket first, then compare within a bucket: A summarizer is input-heavy; a code generator is output-heavy. Output usually costs 3-5x input, so a model that looks cheap on input can lose on a generation-heavy workload. Multiply each rate by your real volume — don't eyeball the sticker price. For RAG and agent loops you re-send the same context constantly. Cached-input pricing is often a huge discount — Sonnet 5's cache hits are 90% cheaper than fresh input $0.20 vs $2.00 /1M — and it can flip the ranking entirely. If your workload is cache-heavy, rank by cached-input price, not raw input. There's a live ranking of caching-capable APIs https://modelpricewatch.com/best-for/prompt-caching if you want the current order. Prices move — Sonnet 5's own launch pricing reverts from $2/$10 to $3/$15 on Sep 1, https://modelpricewatch.com/api/v1/models.json .Worked example: for a chat product doing ~2M input / 0.5M output tokens a day, run those numbers through a cost calculator across your shortlist — and if you re-send a big system prompt each call, add the cached-input rate. The difference between Sonnet 5 with caching and a naive flagship default can be the majority of your bill. Disclosure: I build and maintain Model Price Watch. The method above works with any pricing source — I just happen to keep one current.