Cache hit rates of Inference are more meaningful than the headline costs

wpnews.pro

Tl;Dr: Agents push the full conversation history into context every turn; hence, over a large number of turns, they are extremely read heavy, which in turn is why cache hit rates are an important factor. This post is an analysis of 60+ providers and their cache hit rates using 398 data points. All data sourced from openrouter.ai model pages. This post assumes the reader is familiar with Prefix Caching and all mentions of Caching in this post refer to Prefix Caching.

Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher.

Context processing over multi-turn conversation grows quadratically. Every turn passes the full conversation up until that point into context along with its own input, e.g. turn 100 will push everything up to turn 99 again in the context window. The LLM, on its end, will try to match the longest sequence it can to the caches it has available and build the rest of the conversation as newly input tokens, typically at 10x the price. So on a long 200k context conversation (which btw is a bad idea capability-wise even if you don't care about costs), if the model that costs $5 per million input tokens fails to hit any cache, you'll be charged $1 just for the input processing. Two things determine this:

Cached input pricing- the headline metric everyone looks at. - 2. Cache hit rate- the hidden variable that nobody talks about.

I recently spent significant hours thinking there was a bug in Dirac that caused caches to break, only to find in the end that it was entirely due to Gemini Flash 3's cache hit rate.

While trying to look up the data on this, I found that OpenRouter fortunately publishes this data (go to model's page and look for 'Effective Pricing' section). Since the data is hourly, we have to assume that it doesn't change too much hour to hour.

Provider Cache-hit Tier list #

Providers with multiple endpoints (e.g. Amazon Bedrock US, Bedrock Global, Bedrock (1)) are listed separately — each entry reflects the hit rate of that specific endpoint as observed.

DeepSeek remains the gold standard of caching, which probably doesn't surprise anyone who has used their official API. In fact, all S-tier entries (hitting 75%+ cache rates) are Chinese labs: DeepSeek (87%), StepFun (86.1%), Moonshot AI (84.8%), MiniMax (75.4%), and Xiaomi (74.7%).

The mainstream US labs place somewhere in the middle but as we will see in the next section, the variance is huge and rather interesting.

On the flip side, we have the "F-Tier". Providers like io.net, AkashML, SambaNova, and Nebius are clocking in at exactly 0.0% cache hit rates across the models.

US closed-source big 3 #

The most interesting thing to me from the chart above is, older models from the same provider tend to get lower cache hit rates. If I had to explain it non-cynically, I would guess that from systems engineering POV, it's probably the cache-pool sizes allocated to each model.

Google does worse than the two other providers across the board, especially considering that they own the full stack on TPUs. This gets full-on clowny when you look at the Vertex AI numbers (see table below) - Opus 4.7 on Vertex AI has 65.30% cache hit rate while Google's own Google_Gemini_3.1_Pro_Preview has 37.30% (and this trend applies to all Claude vs Gemini hosted on Vertex)! How do you manage to get a lower cache hit rate on your own hardware with your model trained on that hardware than a competitor's model? If I was to speculate, I'd guess that the whole 'thought signature' architecture is just not working out.

OSS Models Cost Comparison #

OSS models, depending on who you use them from, make a huge difference, mostly due to cache-hit rates.

|---|---|---|---|
| Kimi K2.6 | 0.2760 (StreamLake) | 1.0900 (Phala) | 0.8140 (294.93%) |
| MiMo-V2.5-Pro | 0.3720 (Xiaomi) | 0.9060 (DeepInfra) | 0.5340 (143.55%) |
| DeepSeek V4 Pro (Max) | 0.0560 (DeepSeek) | 1.7220 (Parasail) | 1.6660 (2975.00%) |
| GLM-5.1 | 0.3230 (StreamLake) | 1.7470 (Venice) | 1.4240 (440.87%) |
| MiniMax-M2.7 | 0.1430 (MiniMax) | 0.6000 (SambaNova) | 0.4570 (319.58%) |

Small Model Grift #

Now onto smaller models that seem instinctively cheaper. Below is the average effective pricing for 4 of the most popular local models (hi r/localLlama)

Model Name	Total Providers	Avg Eff. Input	Avg Eff. Output	Avg Cache Hit
Google_Gemma_4_26B_A4B	10	$0.1156	$0.4150	21.57%
Google_Gemma_4_31B	10	$0.1729	$0.5049	7.44%
Qwen_Qwen36_35B_A3B	7	$0.1643	$1.1450	11.54%
Qwen_Qwen36_27B	8	$0.4096	$2.9433	7.66%

Compare that to

Model Name (Official API)	Eff. Input Price	Eff. Output Price	Cache Hit Rate
DeepSeek_DeepSeek_V4_Pro	$0.0560	$0.8690	87.90%
DeepSeek_DeepSeek_V4_Flash	$0.0220	$0.2800	86.10%

Yup, you can use DeepSeek V4 Pro, a 1.6 Trillion parameter model whose active 49B parameters are higher than the total parameters of any small model, for cheaper than you can use either of Qwen3.6 models. Thanks to the providers like io.net and DeepInfra offering $0.32/$3.20 input output pricing with 0 caching.

The full table #

Model Name	Provider	Eff. Input Price	Eff. Output Price	Cache Hit Rate
Zai_GLM_5	SiliconFlow	$0.3100	$2.5490	85.30%
Zai_GLM_5	Baidu Qianfan	$0.3930	$2.2390	54.70%
Zai_GLM_5	GMICloud	$0.4200	$1.9200	37.50%
Zai_GLM_5	DeepInfra	$0.3410	$2.0790	54.00%
Zai_GLM_5	Z.ai	$0.4710	$3.1990	66.10%
Zai_GLM_5	Amazon Bedrock	$1.0000	$3.1990	0.10%
Zai_GLM_5	Friendli	$0.8700	$3.2000	26.00%
Zai_GLM_5	StreamLake	$0.4650	$2.0790	35.60%
Zai_GLM_5	NovitaAI	$0.7340	$3.2000	33.20%
Zai_GLM_5	AtlasCloud	$0.7830	$3.1500	22.00%
Zai_GLM_5	Parasail	$0.7720	$3.2000	28.60%
Zai_GLM_5	Together	$1.0000	$3.2000	0.80%
Zai_GLM_5	Chutes	$0.9410	$2.5500	1.80%
Zai_GLM_5	Phala	$1.2000	$3.5000	0.10%
Qwen_Qwen3_VL_32B_Instruct	Alibaba Cloud Int.	$0.1040	$0.4140	0.00%
Qwen_Qwen36_35B_A3B	Parasail	$0.1020	$1.0000	47.60%
Qwen_Qwen36_35B_A3B	Ambient	$0.1170	$1.0000	33.20%
Qwen_Qwen36_35B_A3B	io.net	$0.1500	$1.0000	0.00%
Qwen_Qwen36_35B_A3B	AkashML	$0.1700	$1.2000	0.00%
Qwen_Qwen36_35B_A3B	AtlasCloud	$0.1610	$0.9650	0.00%
Qwen_Qwen36_35B_A3B	Weights & Biases	$0.2500	$1.2500	0.00%
Qwen_Qwen36_35B_A3B	SiliconFlow	$0.2000	$1.6000	0.00%
OpenAI_GPT-41_Nano	OpenAI	$0.0920	$0.3990	11.00%
OpenAI_GPT-41_Nano	Azure (1)	$0.0710	$0.3990	41.30%
OpenAI_GPT-41_Nano	Azure (2)	$0.1000	$0.3980	0.00%
xAI_Grok_43	xAI	$0.7490	$2.5000	47.80%
Anthropic_Claude_Sonnet_46	Claude Platform on AWS	$0.9370	$15.0000	79.30%
Anthropic_Claude_Sonnet_46	Anthropic	$0.6070	$15.0000	89.90%
Anthropic_Claude_Sonnet_46	Google Vertex (US East)	$2.6200	$15.0000	21.80%
Anthropic_Claude_Sonnet_46	Amazon Bedrock (Global)	$1.3370	$15.0000	64.00%
Anthropic_Claude_Sonnet_46	Amazon Bedrock	$0.9400	$15.0000	78.40%
Anthropic_Claude_Sonnet_46	Google Vertex (Global)	$1.8020	$15.0000	52.70%
Anthropic_Claude_Sonnet_46	Google Vertex (Europe)	$2.4390	$15.0000	28.70%
Xiaomi_MiMo-V25-Pro	Xiaomi	$0.3720	$3.1670	94.80%
Xiaomi_MiMo-V25-Pro	DeepInfra	$0.9060	$3.0000	11.70%
Qwen_Qwen3_Coder_Next	Ionstream	$0.0860	$0.7990	61.10%
Qwen_Qwen3_Coder_Next	Parasail	$0.0860	$0.7990	68.40%
Qwen_Qwen3_Coder_Next	AtlasCloud	$0.1800	$1.3490	0.00%
Qwen_Qwen3_Coder_Next	NovitaAI	$0.2000	$1.4990	0.00%
Anthropic_Claude_Opus_46	Claude Platform on AWS	$2.3320	$25.0000	63.50%
Anthropic_Claude_Opus_46	Amazon Bedrock	$1.4750	$25.0000	81.40%
Anthropic_Claude_Opus_46	Anthropic	$1.6520	$25.0000	79.00%
Anthropic_Claude_Opus_46	Google Vertex	$1.9690	$25.0000	71.10%
Anthropic_Claude_Opus_46	Google Vertex (Europe)	$3.0870	$25.0000	44.30%
Anthropic_Claude_Opus_46	Azure	$6.2500	$25.0000	0.00%
Anthropic_Claude_Opus_47	Claude Platform on AWS	$1.8440	$25.0000	72.40%
Anthropic_Claude_Opus_47	Google Vertex	$2.4580	$25.0000	65.30%
Anthropic_Claude_Opus_47	Amazon Bedrock (US)	$3.9520	$25.0000	23.70%
Anthropic_Claude_Opus_47	Amazon Bedrock	$4.9530	$25.0000	1.20%
Anthropic_Claude_Opus_47	Google Vertex (Europe)	$2.3000	$25.0000	68.20%
Anthropic_Claude_Opus_47	Anthropic	$1.5920	$25.0000	79.10%
MiniMax_MiniMax_M27	MiniMax	$0.1430	$1.2000	65.60%
MiniMax_MiniMax_M27	Together	$0.2010	$1.1990	41.20%
MiniMax_MiniMax_M27	Morph	$0.2790	$1.1990	73.90%
MiniMax_MiniMax_M27	Fireworks	$0.2050	$1.1990	39.20%
MiniMax_MiniMax_M27	MiniMax Highspeed	$0.2510	$2.3990	64.70%
MiniMax_MiniMax_M27	SambaNova	$0.6000	$2.3990	0.00%
Qwen_Qwen36_27B	DeepInfra	$0.3200	$3.1990	0.00%
Qwen_Qwen36_27B	Alibaba Cloud Int.	$0.4500	$2.6990	0.00%
Qwen_Qwen36_27B	Ambient	$0.2670	$3.2000	32.80%
Qwen_Qwen36_27B	Weights & Biases	$0.6000	$3.6000	0.00%
Qwen_Qwen36_27B	io.net	$0.3170	$3.1990	0.00%
Qwen_Qwen36_27B	Morph	$0.4980	$2.3990	28.50%
Qwen_Qwen36_27B	Chutes	$0.5000	$2.0000	0.00%
Qwen_Qwen36_27B	Venice	$0.3250	$3.2500	0.00%
OpenAI_gpt-oss-120b	Google Vertex	$0.0900	$0.3590	4.10%
OpenAI_gpt-oss-120b	DeepInfra	$0.0390	$0.1890	0.00%
OpenAI_gpt-oss-120b	Groq	$0.1200	$0.5990	40.00%
OpenAI_gpt-oss-120b	Cerebras	$0.3500	$0.7490	48.60%
OpenAI_gpt-oss-120b	DekaLLM	$0.0390	$0.1770	0.00%
OpenAI_gpt-oss-120b	Baseten	$0.1000	$0.4990	52.10%
OpenAI_gpt-oss-120b	NovitaAI	$0.0500	$0.2490	2.50%
OpenAI_gpt-oss-120b	Ambient	$0.1070	$0.6000	57.00%
OpenAI_gpt-oss-120b	DeepInfra (Turbo)	$0.1500	$0.5990	0.00%
OpenAI_gpt-oss-120b	Parasail	$0.0890	$0.7490	23.60%
OpenAI_gpt-oss-120b	SiliconFlow	$0.0500	$0.4490	0.00%
OpenAI_gpt-oss-120b	Amazon Bedrock (1)	$0.1500	$0.5990	0.00%
OpenAI_gpt-oss-120b	Nebius Token Factory	$0.1500	$0.5990	0.00%
OpenAI_gpt-oss-120b	SambaNova Dedicated	$0.1200	$0.8990	0.00%
OpenAI_gpt-oss-120b	SambaNova	$0.1400	$0.9490	0.00%
OpenAI_gpt-oss-120b	Together	$0.1500	$0.6000	0.00%
OpenAI_gpt-oss-120b	Phala	$0.1000	$0.4890	14.00%
OpenAI_gpt-oss-120b	MARA	$0.1500	$0.7490	0.00%
OpenAI_gpt-oss-120b	Weights & Biases	$0.1500	$0.5990	12.70%
OpenAI_gpt-oss-120b	Amazon Bedrock (2)	$0.1500	$0.5990	0.00%
Zai_GLM_47_Flash	DeepInfra	$0.0230	$0.3990	73.40%
Zai_GLM_47_Flash	NovitaAI	$0.0610	$0.3990	14.30%
Zai_GLM_47_Flash	Cloudflare	$0.0600	$0.3970	0.00%
Zai_GLM_47_Flash	Phala	$0.1000	$0.4290	0.00%
Zai_GLM_47_Flash	Z.ai	$0.0430	$0.3990	44.20%
Zai_GLM_47_Flash	Venice	$0.1250	$0.4960	14.40%
OpenAI_GPT-51	OpenAI	$0.8220	$10.0000	38.10%
OpenAI_GPT-51	Azure (1)	$0.9550	$10.0000	26.30%
OpenAI_GPT-54_Mini	OpenAI	$0.3370	$4.5110	61.50%
OpenAI_GPT-54_Mini	Azure	$0.6840	$4.4990	9.80%
Meta_Llama_31_8B_Instruct	Groq	$0.0330	$0.0750	67.80%
Meta_Llama_31_8B_Instruct	DeepInfra	$0.0200	$0.0480	0.00%
Meta_Llama_31_8B_Instruct	NovitaAI	$0.0200	$0.0480	0.00%
Meta_Llama_31_8B_Instruct	Cerebras	$0.0990	$0.0950	87.50%
Meta_Llama_31_8B_Instruct	Cloudflare	$0.1520	$0.2850	0.00%
Qwen_Qwen35-9B	DeepInfra	$0.0400	$0.1480	0.00%
Qwen_Qwen35-9B	Together	$0.1000	$0.1490	0.00%
Qwen_Qwen35-9B	SiliconFlow	$0.1000	$0.1490	0.00%
Qwen_Qwen35-9B	Venice	$0.1000	$0.1490	35.50%
MoonshotAI_Kimi_K25	DeepInfra	$0.1660	$2.2490	74.80%
MoonshotAI_Kimi_K25	NovitaAI	$0.2320	$2.8490	71.10%
MoonshotAI_Kimi_K25	ModelRun	$0.1610	$1.9000	77.10%
MoonshotAI_Kimi_K25	Moonshot AI	$0.1360	$3.0000	92.80%
MoonshotAI_Kimi_K25	Fireworks	$0.1960	$3.0000	80.70%
MoonshotAI_Kimi_K25	Chutes	$0.3790	$2.0000	27.70%
MoonshotAI_Kimi_K25	AtlasCloud	$0.3790	$2.4990	38.10%
MoonshotAI_Kimi_K25	SiliconFlow	$0.2590	$2.2500	50.10%
MoonshotAI_Kimi_K25	Cloudflare	$0.3390	$3.0000	52.20%
MoonshotAI_Kimi_K25	Parasail	$0.3810	$2.7990	54.80%
MoonshotAI_Kimi_K25	Phala	$0.6000	$3.0000	2.20%
MoonshotAI_Kimi_K25	Venice	$0.5300	$3.5000	8.90%
Tencent_Hy3_preview	SiliconFlow	$0.0350	$0.2590	84.30%
OpenAI_GPT-53-Codex	OpenAI	$0.2820	$14.0000	93.20%
OpenAI_GPT-53-Codex	Azure	$0.4470	$14.0000	82.70%
StepFun_Step_35_Flash	StepFun	$0.0310	$0.2990	86.10%
StepFun_Step_35_Flash	DeepInfra	$0.0900	$0.2990	0.00%
StepFun_Step_35_Flash	SiliconFlow	$0.1000	$0.3000	0.00%
OpenAI_GPT-54_Nano	OpenAI	$0.0870	$1.2490	62.80%
OpenAI_GPT-54_Nano	Azure	$0.1530	$1.2490	26.10%
DeepSeek_DeepSeek_V3_0324	NovitaAI	$0.1800	$1.1180	66.30%
DeepSeek_DeepSeek_V3_0324	DeepInfra	$0.1600	$0.7670	61.90%
DeepSeek_DeepSeek_V3_0324	ModelRun	$0.1850	$0.7980	50.40%
DeepSeek_DeepSeek_V3_0324	SiliconFlow	$0.2500	$1.0000	52.60%
DeepSeek_DeepSeek_V3_0324	AtlasCloud	$0.2140	$0.8770	4.00%
DeepSeek_DeepSeek_V3_0324	GMICloud	$0.2890	$1.1350	0.60%
Qwen_Qwen35_397B_A17B	Morph	$0.4220	$3.5000	64.70%
Qwen_Qwen35_397B_A17B	Alibaba Cloud Int.	$0.3900	$2.3400	0.00%
Qwen_Qwen35_397B_A17B	Chutes	$0.2710	$3.0000	79.60%
Qwen_Qwen35_397B_A17B	DeepInfra	$0.4900	$3.5990	0.00%
Qwen_Qwen35_397B_A17B	Together	$0.6000	$3.6000	48.40%
Qwen_Qwen35_397B_A17B	NovitaAI	$0.6000	$3.6000	5.10%
Qwen_Qwen35_397B_A17B	Nebius Token Factory	$0.6000	$3.5980	0.00%
Qwen_Qwen35_397B_A17B	AtlasCloud	$0.5500	$3.5000	0.00%
Qwen_Qwen35_397B_A17B	Phala	$0.5500	$3.5000	35.50%
Qwen_Qwen35_397B_A17B	Parasail	$0.4090	$3.6000	45.30%
Qwen_Qwen35_397B_A17B	GMICloud	$0.6000	$3.6000	0.00%
Qwen_Qwen35_397B_A17B	Venice	$0.7500	$4.5000	19.30%
Mistral_Mistral_Small_32_24B	Mistral	$0.0860	$0.2990	15.90%
Mistral_Mistral_Small_32_24B	DeepInfra	$0.0750	$0.1980	0.00%
Mistral_Mistral_Small_32_24B	Parasail	$0.0730	$0.5980	41.90%
Mistral_Mistral_Small_32_24B	Venice	$0.0940	$0.2490	0.00%
Meta_Llama_4_Maverick	Parasail	$0.3140	$1.0000	19.30%
Meta_Llama_4_Maverick	DeepInfra	$0.1500	$0.5970	0.00%
Meta_Llama_4_Maverick	NovitaAI	$0.2700	$0.8460	0.00%
Meta_Llama_4_Maverick	SambaNova	$0.6300	$1.7970	0.00%
Mistral_Mistral_Medium_35	Mistral	$1.5000	$7.4990	21.70%
Qwen_Qwen36_Flash	Alibaba Cloud Int.	$0.1920	$1.1400	0.30%
OpenAI_GPT-41	OpenAI	$1.1000	$8.0000	60.00%
OpenAI_GPT-41	Azure (1)	$1.0780	$8.0000	61.50%
Mistral_Mistral_Nemo	DeepInfra	$0.0200	$0.0370	0.00%
Mistral_Mistral_Nemo	DekaLLM	$0.0200	$0.0250	0.00%
Mistral_Mistral_Nemo	Mistral	$0.0900	$0.1430	44.40%
Mistral_Mistral_Nemo	NovitaAI	$0.0390	$0.1640	0.00%
Google_Gemma_4_31B	DeepInfra Turbo	$0.1200	$0.3690	0.00%
Google_Gemma_4_31B	DeepInfra	$0.1300	$0.3790	0.00%
Google_Gemma_4_31B	NovitaAI	$0.1400	$0.3990	5.80%
Google_Gemma_4_31B	Chutes	$0.1090	$0.3780	31.80%
Google_Gemma_4_31B	Together (2)	$0.3900	$0.9690	0.00%
Google_Gemma_4_31B	SiliconFlow	$0.1300	$0.3990	0.00%
Google_Gemma_4_31B	Together (1)	$0.2800	$0.8580	0.00%
Google_Gemma_4_31B	Ambient	$0.1180	$0.3990	18.40%
Google_Gemma_4_31B	Venice	$0.1750	$0.5000	12.00%
Google_Gemma_4_31B	Parasail	$0.1370	$0.3990	6.40%
Google_Gemini_25_Flash_Lite	Google Vertex (EU)	$0.0800	$0.3990	22.70%
Google_Gemini_25_Flash_Lite	Google Vertex	$0.0940	$0.3990	7.40%
Google_Gemini_25_Flash_Lite	Google AI Studio	$0.0910	$0.3990	12.00%
Google_Gemini_35_Flash	Google Vertex	$0.5520	$9.0080	70.30%
Google_Gemini_35_Flash	Google AI Studio	$0.6340	$7.9040	63.50%
DeepSeek_DeepSeek_V31	Weights & Biases	$0.5500	$1.6370	31.80%
DeepSeek_DeepSeek_V31	NovitaAI	$0.2620	$1.0000	5.70%
DeepSeek_DeepSeek_V31	DeepInfra	$0.1750	$0.7880	43.20%
DeepSeek_DeepSeek_V31	SiliconFlow	$0.2700	$1.0000	36.80%
DeepSeek_DeepSeek_V31	AtlasCloud	$0.2870	$0.9470	7.30%
DeepSeek_DeepSeek_V31	Google Vertex	$0.6000	$1.7000	19.30%
DeepSeek_DeepSeek_V31	SambaNova	$0.6500	$1.4990	0.00%
Google_Gemma_4_26B_A4B	Google Vertex	$0.1500	$0.5970	19.10%
Google_Gemma_4_26B_A4B	NovitaAI	$0.1300	$0.4000	16.80%
Google_Gemma_4_26B_A4B	NextBit	$0.1180	$0.3980	63.40%
Google_Gemma_4_26B_A4B	DeepInfra	$0.0700	$0.3370	0.00%
Google_Gemma_4_26B_A4B	Cloudflare	$0.1000	$0.2970	0.00%
Google_Gemma_4_26B_A4B	Parasail	$0.0960	$0.3980	41.80%
Google_Gemma_4_26B_A4B	DekaLLM	$0.0600	$0.3270	0.00%
Google_Gemma_4_26B_A4B	SiliconFlow	$0.1200	$0.3970	0.00%
Google_Gemma_4_26B_A4B	Venice	$0.1620	$0.4990	74.60%
Google_Gemma_4_26B_A4B	io.net	$0.1500	$0.5000	0.00%
OpenAI_gpt-oss-20b	Weights & Biases	$0.0500	$0.2000	69.00%
OpenAI_gpt-oss-20b	DeepInfra	$0.0300	$0.1400	0.00%

| OpenAI_gpt-oss-20b | Amazon Bedrock (2) | $0.0700 | $0.1500 | 0.00% |
| OpenAI_gpt-oss-20b | Amazon Bedrock (1) | $0.0700 | $0.1500 | 0.00% |

| OpenAI_gpt-oss-20b | NovitaAI | $0.0400 | $0.1500 | 0.00% | | OpenAI_gpt-oss-20b | Google Vertex | $0.0690 | $0.2460 | 63.20% | | OpenAI_gpt-oss-20b | Groq | $0.0660 | $0.2990 | 22.40% | | OpenAI_gpt-oss-20b | Parasail | $0.0250 | $0.1990 | 71.40% | | OpenAI_gpt-oss-20b | SiliconFlow | $0.0400 | $0.1790 | 0.00% | | OpenAI_gpt-oss-20b | Fireworks | $0.0580 | $0.2990 | 33.70% | | OpenAI_gpt-oss-20b | NextBit | $0.1000 | $0.4490 | 0.00% | | OpenAI_gpt-oss-20b | Together | $0.0490 | $0.1990 | 0.00% | | OpenAI_GPT-41_Mini | OpenAI | $0.2800 | $1.5940 | 40.00% | | OpenAI_GPT-41_Mini | Azure (2) | $0.2520 | $1.5980 | 49.30% | | OpenAI_GPT-41_Mini | Azure (1) | $0.2020 | $1.5930 | 66.00% | | DeepSeek_DeepSeek_V32 | NovitaAI | $0.2490 | $0.3990 | 15.20% | | DeepSeek_DeepSeek_V32 | Baidu Qianfan | $0.1050 | $0.3760 | 64.90% | | DeepSeek_DeepSeek_V32 | SiliconFlow | $0.1720 | $0.4190 | 69.80% | | DeepSeek_DeepSeek_V32 | DeepInfra | $0.1950 | $0.3790 | 49.60% | | DeepSeek_DeepSeek_V32 | AtlasCloud | $0.2410 | $0.3790 | 14.40% | | DeepSeek_DeepSeek_V32 | Friendli | $0.3990 | $1.5000 | 40.50% | | DeepSeek_DeepSeek_V32 | Alibaba Cloud Int. | $0.2730 | $1.1110 | 32.50% | | DeepSeek_DeepSeek_V32 | Parasail | $0.2540 | $0.4470 | 17.20% | | DeepSeek_DeepSeek_V32 | Google Vertex | $0.5600 | $1.6790 | 8.00% | | Google_Gemini_20_Flash_Lite | Google Vertex | $0.0750 | $0.2980 | 0.00% | | Google_Gemini_20_Flash_Lite | Google AI Studio | $0.0750 | $0.2950 | 0.00% | | Anthropic_Claude_Sonnet_4 | Amazon Bedrock (1) | $1.6300 | $15.0000 | 53.70% | | Anthropic_Claude_Sonnet_4 | Anthropic | $1.9410 | $15.0000 | 41.60% | | Anthropic_Claude_Sonnet_4 | Amazon Bedrock (2) | $2.1510 | $15.0000 | 35.10% | | Anthropic_Claude_Sonnet_4 | Google Vertex (Global) | $2.3880 | $15.0000 | 29.50% | | Anthropic_Claude_Sonnet_4 | Google Vertex (Europe) | $1.7360 | $15.0000 | 58.40% | | Anthropic_Claude_Haiku_45 | Amazon Bedrock (Global) | $0.3670 | $5.0000 | 72.40% | | Anthropic_Claude_Haiku_45 | Anthropic | $0.5420 | $5.0000 | 54.80% | | Anthropic_Claude_Haiku_45 | Google Vertex (Europe) | $0.9760 | $5.0000 | 2.80% | | Anthropic_Claude_Haiku_45 | Amazon Bedrock | $0.5780 | $5.0000 | 48.70% | | Anthropic_Claude_Haiku_45 | Google Vertex | $0.4890 | $5.0000 | 58.20% | | OpenAI_GPT-54 | Azure | $1.8150 | $15.0000 | 30.40% | | OpenAI_GPT-54 | OpenAI | $0.8980 | $15.1190 | 74.90% | | OpenAI_GPT-5 | OpenAI | $0.4550 | $10.0000 | 70.60% | | OpenAI_GPT-5 | Azure (1) | $1.2500 | $10.0000 | 0.00% | | Xiaomi_MiMo-V2-Flash | Xiaomi | $0.0500 | $0.2980 | 55.00% | | Xiaomi_MiMo-V2-Flash | NovitaAI | $0.0530 | $0.2990 | 58.70% | | Qwen_Qwen35-35B-A3B | Parasail | $0.0910 | $1.0000 | 59.00% | | Qwen_Qwen35-35B-A3B | Alibaba Cloud Int. | $0.1620 | $1.2990 | 0.00% | | Qwen_Qwen35-35B-A3B | AkashML | $0.1600 | $1.2000 | 0.00% | | Qwen_Qwen35-35B-A3B | Ambient | $0.1020 | $1.0000 | 41.60% | | Qwen_Qwen35-35B-A3B | Venice | $0.1970 | $1.2500 | 73.90% | | Qwen_Qwen35-35B-A3B | DeepInfra | $0.1400 | $1.0000 | 0.00% | | Qwen_Qwen35-35B-A3B | DekaLLM | $0.1390 | $1.0000 | 0.00% | | Qwen_Qwen35-35B-A3B | AtlasCloud | $0.2250 | $1.8000 | 0.00% | | Qwen_Qwen35-35B-A3B | NextBit | $0.3000 | $1.8000 | 0.00% | | Qwen_Qwen35-35B-A3B | SiliconFlow | $0.2400 | $1.8000 | 0.00% | | Google_Gemini_25_Pro | Google Vertex (Global) | $1.0410 | $10.0650 | 30.10% | | Google_Gemini_25_Pro | Google Vertex (EU) | $0.9680 | $10.0000 | 26.40% | | Google_Gemini_25_Pro | Google AI Studio | $0.9680 | $10.0000 | 35.40% | | Google_Gemini_25_Pro | Google Vertex (US) | $1.2200 | $10.0000 | 2.60% | | Zai_GLM_51 | StreamLake | $0.3230 | $3.9590 | 91.30% | | Zai_GLM_51 | Friendli | $0.6440 | $4.4000 | 66.30% | | Zai_GLM_51 | Z.ai | $0.5660 | $4.3990 | 73.10% | | Zai_GLM_51 | Chutes | $0.9960 | $4.0000 | 34.00% | | Zai_GLM_51 | AtlasCloud | $0.5020 | $4.4000 | 78.50% | | Zai_GLM_51 | DeepInfra | $0.4450 | $3.5000 | 71.60% | | Zai_GLM_51 | SiliconFlow | $0.5070 | $4.3990 | 78.30% | | Zai_GLM_51 | NovitaAI | $0.5120 | $4.4000 | 77.50% | | Zai_GLM_51 | Baidu Qianfan | $0.5880 | $3.0790 | 49.10% | | Zai_GLM_51 | Baseten | $1.3000 | $4.3000 | 45.60% | | Zai_GLM_51 | Inceptron | $1.0810 | $4.4000 | 28.00% | | Zai_GLM_51 | Together | $1.4000 | $4.4000 | 6.50% | | Zai_GLM_51 | Parasail | $1.1110 | $4.4000 | 25.30% | | Zai_GLM_51 | Ambient | $1.4000 | $4.4000 | 14.00% | | Zai_GLM_51 | Phala | $1.2100 | $4.2000 | 0.70% | | Zai_GLM_51 | io.net | $1.2900 | $4.4800 | 0.00% | | Zai_GLM_51 | Fireworks | $1.2730 | $4.3990 | 11.10% | | Zai_GLM_51 | Venice | $1.7470 | $5.5000 | 0.20% | | Zai_GLM_51 | GMICloud | $0.0000 | $0.0000 | 0.00% | | Zai_GLM_47 | Z.ai | $0.1170 | $2.1990 | 98.60% | | Zai_GLM_47 | Google Vertex | $0.6000 | $2.2000 | 36.50% | | Zai_GLM_47 | SiliconFlow | $0.2040 | $2.2000 | 72.30% | | Zai_GLM_47 | DeepInfra | $0.2180 | $1.7500 | 56.90% | | Zai_GLM_47 | Cerebras | $2.2500 | $2.7500 | 45.60% | | Zai_GLM_47 | AtlasCloud | $0.3220 | $1.8500 | 49.60% | | Zai_GLM_47 | Phala | $0.8500 | $3.2990 | 3.50% | | Zai_GLM_47 | NovitaAI | $0.3700 | $2.0050 | 40.80% | | Zai_GLM_47 | Parasail | $0.4270 | $2.1000 | 6.70% | | Zai_GLM_47 | Venice | $0.5490 | $2.6490 | 0.30% | | OpenAI_GPT-55 | OpenAI | $1.1170 | $30.6400 | 92.70% | | OpenAI_GPT-55 | Azure | $2.3150 | $30.0820 | 62.00% | | Google_Gemini_31_Flash_Lite | Google Vertex | $0.2100 | $1.4980 | 19.20% | | Google_Gemini_31_Flash_Lite | Google AI Studio | $0.1860 | $1.4680 | 27.40% | | Qwen_Qwen3_Coder_480B_A35B | DeepInfra (Turbo) | $0.1210 | $1.0000 | 89.30% | | Qwen_Qwen3_Coder_480B_A35B | Together | $2.0000 | $2.0000 | 0.00% | | Qwen_Qwen3_Coder_480B_A35B | Google Vertex | $0.2200 | $1.7990 | 49.20% | | Qwen_Qwen3_Coder_480B_A35B | AtlasCloud | $0.7800 | $3.7980 | 0.00% | | Qwen_Qwen3_Coder_480B_A35B | Weights & Biases | $1.0000 | $1.4980 | 70.50% | | Qwen_Qwen3_Coder_480B_A35B | Alibaba OpenSource | $1.3730 | $6.8700 | 0.00% | | Qwen_Qwen3_Coder_480B_A35B | NovitaAI | $0.3800 | $1.5490 | 0.00% | | Qwen_Qwen3_Coder_480B_A35B | Venice | $0.3500 | $1.5000 | 0.00% | | Google_Gemini_3_Flash_Preview | Google Vertex | $0.3600 | $2.9970 | 32.00% | | Google_Gemini_3_Flash_Preview | Google AI Studio | $0.3480 | $2.9980 | 34.50% | | Qwen_Qwen36_Plus | Alibaba Cloud Int. | $0.6690 | $1.9820 | 6.00% | | Mistral_Mistral_Small_4 | Mistral | $0.1280 | $0.5990 | 16.10% | | Mistral_Mistral_Small_4 | Venice | $0.1870 | $0.7490 | 11.20% | | MoonshotAI_Kimi_K26 | SiliconFlow | $0.2920 | $4.0000 | 83.90% | | MoonshotAI_Kimi_K26 | Moonshot AI | $0.3430 | $4.0000 | 76.80% | | MoonshotAI_Kimi_K26 | Cloudflare | $0.6870 | $3.5000 | 13.60% | | MoonshotAI_Kimi_K26 | Inceptron | $0.3360 | $3.5000 | 76.60% | | MoonshotAI_Kimi_K26 | Weights & Biases | $0.5520 | $4.0000 | 50.30% | | MoonshotAI_Kimi_K26 | NovitaAI | $0.2850 | $3.4000 | 80.50% | | MoonshotAI_Kimi_K26 | Chutes | $0.4410 | $3.5000 | 80.80% | | MoonshotAI_Kimi_K26 | Fireworks | $0.4160 | $4.0000 | 67.60% | | MoonshotAI_Kimi_K26 | Together | $0.4250 | $4.5000 | 77.40% | | MoonshotAI_Kimi_K26 | Parasail | $0.3120 | $3.5000 | 74.20% | | MoonshotAI_Kimi_K26 | io.net | $0.7300 | $3.4900 | 0.00% | | MoonshotAI_Kimi_K26 | AkashML | $0.9500 | $4.0000 | 0.00% | | MoonshotAI_Kimi_K26 | DeepInfra | $0.3220 | $3.5000 | 71.30% | | MoonshotAI_Kimi_K26 | AtlasCloud | $0.6370 | $4.0000 | 39.70% | | MoonshotAI_Kimi_K26 | Nebius Token Factory | $0.9500 | $4.0000 | 0.00% | | MoonshotAI_Kimi_K26 | StreamLake | $0.2760 | $3.8000 | 83.50% | | MoonshotAI_Kimi_K26 | Phala | $1.0900 | $4.6000 | 6.10% | | MoonshotAI_Kimi_K26 | Venice | $0.7270 | $4.6550 | 19.50% | | Qwen_Qwen35-Flash | Alibaba Cloud Int. | $0.0650 | $0.2590 | 0.00% | | OpenAI_GPT-5_Nano | OpenAI | $0.0380 | $0.3960 | 25.10% | | OpenAI_GPT-5_Nano | Azure (1) | $0.0350 | $0.3990 | 36.50% | | DeepSeek_DeepSeek_V4_Flash | DeepSeek | $0.0220 | $0.2800 | 86.10% | | DeepSeek_DeepSeek_V4_Flash | SiliconFlow | $0.0890 | $0.2790 | 45.60% | | DeepSeek_DeepSeek_V4_Flash | Alibaba Cloud Int. | $0.0720 | $0.2790 | 61.00% | | DeepSeek_DeepSeek_V4_Flash | NovitaAI | $0.0720 | $0.2790 | 60.60% | | DeepSeek_DeepSeek_V4_Flash | Parasail | $0.1300 | $0.2790 | 14.40% | | DeepSeek_DeepSeek_V4_Flash | AtlasCloud | $0.0870 | $0.2790 | 47.40% | | DeepSeek_DeepSeek_V4_Flash | DeepInfra | $0.0790 | $0.1990 | 25.70% | | DeepSeek_DeepSeek_V4_Flash | GMICloud | $0.0690 | $0.2230 | 48.20% | | DeepSeek_DeepSeek_V4_Flash | Baidu Qianfan | $0.0810 | $0.2510 | 44.40% | | DeepSeek_DeepSeek_V4_Flash | AkashML | $0.1400 | $0.2790 | 0.00% | | DeepSeek_DeepSeek_V4_Flash | Venice | $0.1510 | $0.3490 | 13.60% | | OpenAI_GPT-52 | OpenAI | $0.9490 | $14.0000 | 50.90% | | OpenAI_GPT-52 | Azure | $1.0480 | $14.0000 | 44.60% | | Google_Gemini_25_Flash | Google Vertex (EU) | $0.2350 | $2.4990 | 25.30% | | Google_Gemini_25_Flash | Google Vertex (Global) | $0.2330 | $2.4990 | 27.50% | | Google_Gemini_25_Flash | Google AI Studio | $0.1770 | $2.4990 | 47.80% | | Google_Gemini_25_Flash | Google Vertex | $0.1780 | $2.5000 | 45.10% | | Google_Gemini_31_Pro_Preview | Google Vertex | $1.4470 | $12.0080 | 37.30% | | Google_Gemini_31_Pro_Preview | Google AI Studio | $1.5670 | $11.9550 | 24.20% | | OpenAI_GPT-5_Mini | OpenAI | $0.1330 | $1.9960 | 50.70% | | OpenAI_GPT-5_Mini | Azure (1) | $0.0900 | $2.0000 | 72.80% | | Xiaomi_MiMo-V25 | Xiaomi | $0.1830 | $2.0570 | 74.20% | | Owl_Alpha | Stealth | $0.0000 | $0.0000 | 62.60% | | DeepSeek_DeepSeek_V4_Pro | SiliconFlow | $0.7450 | $3.4790 | 59.30% | | DeepSeek_DeepSeek_V4_Pro | DeepSeek | $0.0560 | $0.8690 | 87.90% | | DeepSeek_DeepSeek_V4_Pro | NovitaAI | $0.4060 | $3.3790 | 82.00% | | DeepSeek_DeepSeek_V4_Pro | Alibaba Cloud Int. | $0.9230 | $3.3600 | 49.10% | | DeepSeek_DeepSeek_V4_Pro | GMICloud | $0.2460 | $2.7830 | 89.80% | | DeepSeek_DeepSeek_V4_Pro | DeepInfra | $0.7350 | $2.5990 | 47.10% | | DeepSeek_DeepSeek_V4_Pro | Baidu Qianfan | $0.8920 | $3.0410 | 45.10% | | DeepSeek_DeepSeek_V4_Pro | AtlasCloud | $1.1450 | $3.3800 | 34.50% | | DeepSeek_DeepSeek_V4_Pro | Parasail | $1.7220 | $3.4780 | 2.10% | | DeepSeek_DeepSeek_V4_Pro | Fireworks | $1.3190 | $3.4800 | 26.40% | | DeepSeek_DeepSeek_V4_Pro | Together | $1.4570 | $4.4000 | 33.80% | | DeepSeek_DeepSeek_V4_Pro | Venice | $0.7990 | $3.7950 | 66.50% | | OpenAI_GPT-4o-mini | OpenAI | $0.1440 | $0.5970 | 8.00% |

| OpenAI_GPT-4o-mini | Azure (1) | $0.1260 | $0.5980 | 32.00% |
| OpenAI_GPT-4o-mini | Azure (2) | $0.1250 | $0.5980 | 33.50% |

| Google_Gemini_31_Flash_Lite_Preview | Google AI Studio | $0.1640 | $1.4990 | 38.30% | | Google_Gemini_31_Flash_Lite_Preview | Google Vertex | $0.2300 | $1.4990 | 10.40% | | Zai_GLM_45_Air | Z.ai | $0.0750 | $1.0960 | 73.50% | | Zai_GLM_45_Air | NovitaAI | $0.0540 | $0.8460 | 72.30% | | Zai_GLM_45_Air | SiliconFlow | $0.1400 | $0.8580 | 42.40% | | Anthropic_Claude_Sonnet_45 | Amazon Bedrock (1) | $2.2850 | $15.0010 | 30.40% | | Anthropic_Claude_Sonnet_45 | Google Vertex (Global) | $1.6100 | $15.0000 | 55.10% | | Anthropic_Claude_Sonnet_45 | Claude Platform on AWS | $0.9770 | $15.0000 | 77.80% | | Anthropic_Claude_Sonnet_45 | Anthropic | $3.0210 | $15.0000 | 5.40% | | Anthropic_Claude_Sonnet_45 | Google Vertex | $0.9840 | $15.0000 | 78.40% | | Anthropic_Claude_Sonnet_45 | Amazon Bedrock (2) | $1.7740 | $15.0000 | 48.70% | | MiniMax_MiniMax_M25 | DeepInfra | $0.0750 | $1.1480 | 62.50% | | MiniMax_MiniMax_M25 | MiniMax Highspeed | $0.1800 | $2.3990 | 77.80% | | MiniMax_MiniMax_M25 | Inceptron | $0.0740 | $0.8980 | 78.80% | | MiniMax_MiniMax_M25 | MiniMax | $0.0700 | $1.1990 | 85.30% | | MiniMax_MiniMax_M25 | AtlasCloud | $0.1110 | $1.1980 | 78.20% | | MiniMax_MiniMax_M25 | NovitaAI | $0.0710 | $1.1970 | 84.90% | | MiniMax_MiniMax_M25 | Chutes | $0.1080 | $1.1990 | 56.30% | | MiniMax_MiniMax_M25 | Baidu Qianfan | $0.1170 | $1.0790 | 62.90% | | MiniMax_MiniMax_M25 | Friendli | $0.1090 | $1.1990 | 79.50% | | MiniMax_MiniMax_M25 | AkashML | $0.1500 | $1.1490 | 0.00% | | MiniMax_MiniMax_M25 | Parasail | $0.1310 | $1.1990 | 62.60% | | MiniMax_MiniMax_M25 | MARA | $0.3000 | $1.2000 | 0.00% | | MiniMax_MiniMax_M25 | SiliconFlow | $0.1580 | $1.1990 | 52.60% | | MiniMax_MiniMax_M25 | Phala | $0.2000 | $1.3790 | 32.90% | | MiniMax_MiniMax_M25 | Weights & Biases | $0.2990 | $1.1960 | 24.20% | | MiniMax_MiniMax_M25 | StreamLake | $0.3050 | $1.1990 | 6.80% | | MiniMax_MiniMax_M25 | Venice | $0.1480 | $1.1880 | 63.80% | | Anthropic_Claude_Opus_45 | Google Vertex | $4.0740 | $25.0000 | 36.70% | | Anthropic_Claude_Opus_45 | Amazon Bedrock (1) | $0.9350 | $25.0000 | 91.90% | | Anthropic_Claude_Opus_45 | Amazon Bedrock (2) | $3.3230 | $25.0000 | 41.30% | | Anthropic_Claude_Opus_45 | Anthropic | $4.4120 | $25.0000 | 23.00% | | Anthropic_Claude_Opus_45 | Claude Platform on AWS | $1.7950 | $25.0000 | 74.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Weights & Biases | $0.1000 | $0.0950 | 32.30% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | NovitaAI | $0.0900 | $0.5770 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | DeepInfra | $0.0710 | $0.0970 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Parasail | $0.0780 | $0.5990 | 43.10% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Alibaba Cloud Int. | $0.1490 | $0.5630 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Google Vertex (2) | $0.2200 | $0.8790 | 28.40% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Cerebras | $0.6000 | $1.1950 | 69.90% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Google Vertex (1) | $0.2500 | $1.0000 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Friendli | $0.2000 | $0.7980 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Together | $0.2000 | $0.5980 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | AtlasCloud | $0.2000 | $0.8760 | 0.00% | | OpenAI_GPT-51_Chat | OpenAI | $0.6280 | $10.0000 | 55.30% | | Google_Gemini_20_Flash | Google Vertex | $0.0980 | $0.3990 | 5.40% | | Google_Gemini_20_Flash | Google AI Studio | $0.1010 | $0.3990 | 0.60% |

Opinion #

The "cheap" providers aren't always cheap. Besides that, one thing we did not discuss in this essay is the constant price increases from almost every provider. The numbers look worse when you consider the inference has been getting more expensive too. It makes increasingly more sense to move to a hybrid first (with a smaller local model) and then fully local setup eventually for coding agents.

source & further reading

dirac.run — original article