cd /news/large-language-models/cache-hit-rates-of-inference-are-mor… · home topics large-language-models article
[ARTICLE · art-19030] src=dirac.run pub= topic=large-language-models verified=true sentiment=· neutral

Cache hit rates of Inference are more meaningful than the headline costs

An analysis of 60+ AI inference providers using 398 data points reveals that cache hit rates, not headline input costs, are the primary driver of real-world pricing for agentic workflows. Chinese labs including DeepSeek (87%), StepFun (86.1%), and Moonshot AI (84.8%) achieve the highest cache hit rates, while providers like io.net, AkashML, SambaNova, and Nebius record 0.0% cache hit rates, making them significantly more expensive for multi-turn conversations. The findings show that Google's own Gemini models on Vertex AI have lower cache hit rates (37.30%) than competitor Claude models hosted on the same platform (65.30%), suggesting architectural inefficiencies in Google's approach.

read28 min publishedMay 31, 2026

Tl;Dr: Agents push the full conversation history into context every turn; hence, over a large number of turns, they are extremely read heavy, which in turn is why cache hit rates are an important factor. This post is an analysis of 60+ providers and their cache hit rates using 398 data points. All data sourced from openrouter.ai model pages. This post assumes the reader is familiar with Prefix Caching and all mentions of Caching in this post refer to Prefix Caching.

Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher.

Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher.

Context processing over multi-turn conversation grows quadratically. Every turn passes the full conversation up until that point into context along with its own input, e.g. turn 100 will push everything up to turn 99 again in the context window. The LLM, on its end, will try to match the longest sequence it can to the caches it has available and build the rest of the conversation as newly input tokens, typically at 10x the price. So on a long 200k context conversation (which btw is a bad idea capability-wise even if you don't care about costs), if the model that costs $5 per million input tokens fails to hit any cache, you'll be charged $1 just for the input processing. Two things determine this:

Cached input pricing- the headline metric everyone looks at. - 2. Cache hit rate- the hidden variable that nobody talks about.

I recently spent significant hours thinking there was a bug in Dirac that caused caches to break, only to find in the end that it was entirely due to Gemini Flash 3's cache hit rate.

While trying to look up the data on this, I found that OpenRouter fortunately publishes this data (go to model's page and look for 'Effective Pricing' section). Since the data is hourly, we have to assume that it doesn't change too much hour to hour.

Provider Cache-hit Tier list #

Providers with multiple endpoints (e.g. Amazon Bedrock US, Bedrock Global, Bedrock (1)) are listed separately — each entry reflects the hit rate of that specific endpoint as observed.

DeepSeek remains the gold standard of caching, which probably doesn't surprise anyone who has used their official API. In fact, all S-tier entries (hitting 75%+ cache rates) are Chinese labs: DeepSeek (87%), StepFun (86.1%), Moonshot AI (84.8%), MiniMax (75.4%), and Xiaomi (74.7%).

The mainstream US labs place somewhere in the middle but as we will see in the next section, the variance is huge and rather interesting.

On the flip side, we have the "F-Tier". Providers like io.net, AkashML, SambaNova, and Nebius are clocking in at exactly 0.0% cache hit rates across the models.

US closed-source big 3 #

The most interesting thing to me from the chart above is, older models from the same provider tend to get lower cache hit rates. If I had to explain it non-cynically, I would guess that from systems engineering POV, it's probably the cache-pool sizes allocated to each model.

Google does worse than the two other providers across the board, especially considering that they own the full stack on TPUs. This gets full-on clowny when you look at the Vertex AI numbers (see table below) - Opus 4.7 on Vertex AI has 65.30% cache hit rate while Google's own Google_Gemini_3.1_Pro_Preview has 37.30% (and this trend applies to all Claude vs Gemini hosted on Vertex)! How do you manage to get a lower cache hit rate on your own hardware with your model trained on that hardware than a competitor's model? If I was to speculate, I'd guess that the whole 'thought signature' architecture is just not working out.

OSS Models Cost Comparison #

OSS models, depending on who you use them from, make a huge difference, mostly due to cache-hit rates.

| Model | Cheapest effectiveInputPrice (Provider) | Most Expensive effectiveInputPrice (Provider) | Difference (Percentage) |

|---|---|---|---|
| Kimi K2.6 | 0.2760 (StreamLake) | 1.0900 (Phala) | 0.8140 (294.93%) |
| MiMo-V2.5-Pro | 0.3720 (Xiaomi) | 0.9060 (DeepInfra) | 0.5340 (143.55%) |
| DeepSeek V4 Pro (Max) | 0.0560 (DeepSeek) | 1.7220 (Parasail) | 1.6660 (2975.00%) |
| GLM-5.1 | 0.3230 (StreamLake) | 1.7470 (Venice) | 1.4240 (440.87%) |
| MiniMax-M2.7 | 0.1430 (MiniMax) | 0.6000 (SambaNova) | 0.4570 (319.58%) |

Small Model Grift #

Now onto smaller models that seem instinctively cheaper. Below is the average effective pricing for 4 of the most popular local models (hi r/localLlama)

Model Name Total Providers Avg Eff. Input Avg Eff. Output Avg Cache Hit
Google_Gemma_4_26B_A4B 10 $0.1156 $0.4150 21.57%
Google_Gemma_4_31B 10 $0.1729 $0.5049 7.44%
Qwen_Qwen36_35B_A3B 7 $0.1643 $1.1450 11.54%
Qwen_Qwen36_27B 8 $0.4096 $2.9433 7.66%

Compare that to

Model Name (Official API) Eff. Input Price Eff. Output Price Cache Hit Rate
DeepSeek_DeepSeek_V4_Pro $0.0560 $0.8690 87.90%
DeepSeek_DeepSeek_V4_Flash $0.0220 $0.2800 86.10%

Yup, you can use DeepSeek V4 Pro, a 1.6 Trillion parameter model whose active 49B parameters are higher than the total parameters of any small model, for cheaper than you can use either of Qwen3.6 models. Thanks to the providers like io.net and DeepInfra offering $0.32/$3.20 input output pricing with 0 caching.

The full table #

Model Name Provider Eff. Input Price Eff. Output Price Cache Hit Rate
Zai_GLM_5 SiliconFlow $0.3100 $2.5490 85.30%
Zai_GLM_5 Baidu Qianfan $0.3930 $2.2390 54.70%
Zai_GLM_5 GMICloud $0.4200 $1.9200 37.50%
Zai_GLM_5 DeepInfra $0.3410 $2.0790 54.00%
Zai_GLM_5 Z.ai $0.4710 $3.1990 66.10%
Zai_GLM_5 Amazon Bedrock $1.0000 $3.1990 0.10%
Zai_GLM_5 Friendli $0.8700 $3.2000 26.00%
Zai_GLM_5 StreamLake $0.4650 $2.0790 35.60%
Zai_GLM_5 NovitaAI $0.7340 $3.2000 33.20%
Zai_GLM_5 AtlasCloud $0.7830 $3.1500 22.00%
Zai_GLM_5 Parasail $0.7720 $3.2000 28.60%
Zai_GLM_5 Together $1.0000 $3.2000 0.80%
Zai_GLM_5 Chutes $0.9410 $2.5500 1.80%
Zai_GLM_5 Phala $1.2000 $3.5000 0.10%
Qwen_Qwen3_VL_32B_Instruct Alibaba Cloud Int. $0.1040 $0.4140 0.00%
Qwen_Qwen36_35B_A3B Parasail $0.1020 $1.0000 47.60%
Qwen_Qwen36_35B_A3B Ambient $0.1170 $1.0000 33.20%
Qwen_Qwen36_35B_A3B io.net $0.1500 $1.0000 0.00%
Qwen_Qwen36_35B_A3B AkashML $0.1700 $1.2000 0.00%
Qwen_Qwen36_35B_A3B AtlasCloud $0.1610 $0.9650 0.00%
Qwen_Qwen36_35B_A3B Weights & Biases $0.2500 $1.2500 0.00%
Qwen_Qwen36_35B_A3B SiliconFlow $0.2000 $1.6000 0.00%
OpenAI_GPT-41_Nano OpenAI $0.0920 $0.3990 11.00%
OpenAI_GPT-41_Nano Azure (1) $0.0710 $0.3990 41.30%
OpenAI_GPT-41_Nano Azure (2) $0.1000 $0.3980 0.00%
xAI_Grok_43 xAI $0.7490 $2.5000 47.80%
Anthropic_Claude_Sonnet_46 Claude Platform on AWS $0.9370 $15.0000 79.30%
Anthropic_Claude_Sonnet_46 Anthropic $0.6070 $15.0000 89.90%
Anthropic_Claude_Sonnet_46 Google Vertex (US East) $2.6200 $15.0000 21.80%
Anthropic_Claude_Sonnet_46 Amazon Bedrock (Global) $1.3370 $15.0000 64.00%
Anthropic_Claude_Sonnet_46 Amazon Bedrock $0.9400 $15.0000 78.40%
Anthropic_Claude_Sonnet_46 Google Vertex (Global) $1.8020 $15.0000 52.70%
Anthropic_Claude_Sonnet_46 Google Vertex (Europe) $2.4390 $15.0000 28.70%
Xiaomi_MiMo-V25-Pro Xiaomi $0.3720 $3.1670 94.80%
Xiaomi_MiMo-V25-Pro DeepInfra $0.9060 $3.0000 11.70%
Qwen_Qwen3_Coder_Next Ionstream $0.0860 $0.7990 61.10%
Qwen_Qwen3_Coder_Next Parasail $0.0860 $0.7990 68.40%
Qwen_Qwen3_Coder_Next AtlasCloud $0.1800 $1.3490 0.00%
Qwen_Qwen3_Coder_Next NovitaAI $0.2000 $1.4990 0.00%
Anthropic_Claude_Opus_46 Claude Platform on AWS $2.3320 $25.0000 63.50%
Anthropic_Claude_Opus_46 Amazon Bedrock $1.4750 $25.0000 81.40%
Anthropic_Claude_Opus_46 Anthropic $1.6520 $25.0000 79.00%
Anthropic_Claude_Opus_46 Google Vertex $1.9690 $25.0000 71.10%
Anthropic_Claude_Opus_46 Google Vertex (Europe) $3.0870 $25.0000 44.30%
Anthropic_Claude_Opus_46 Azure $6.2500 $25.0000 0.00%
Anthropic_Claude_Opus_47 Claude Platform on AWS $1.8440 $25.0000 72.40%
Anthropic_Claude_Opus_47 Google Vertex $2.4580 $25.0000 65.30%
Anthropic_Claude_Opus_47 Amazon Bedrock (US) $3.9520 $25.0000 23.70%
Anthropic_Claude_Opus_47 Amazon Bedrock $4.9530 $25.0000 1.20%
Anthropic_Claude_Opus_47 Google Vertex (Europe) $2.3000 $25.0000 68.20%
Anthropic_Claude_Opus_47 Anthropic $1.5920 $25.0000 79.10%
MiniMax_MiniMax_M27 MiniMax $0.1430 $1.2000 65.60%
MiniMax_MiniMax_M27 Together $0.2010 $1.1990 41.20%
MiniMax_MiniMax_M27 Morph $0.2790 $1.1990 73.90%
MiniMax_MiniMax_M27 Fireworks $0.2050 $1.1990 39.20%
MiniMax_MiniMax_M27 MiniMax Highspeed $0.2510 $2.3990 64.70%
MiniMax_MiniMax_M27 SambaNova $0.6000 $2.3990 0.00%
Qwen_Qwen36_27B DeepInfra $0.3200 $3.1990 0.00%
Qwen_Qwen36_27B Alibaba Cloud Int. $0.4500 $2.6990 0.00%
Qwen_Qwen36_27B Ambient $0.2670 $3.2000 32.80%
Qwen_Qwen36_27B Weights & Biases $0.6000 $3.6000 0.00%
Qwen_Qwen36_27B io.net $0.3170 $3.1990 0.00%
Qwen_Qwen36_27B Morph $0.4980 $2.3990 28.50%
Qwen_Qwen36_27B Chutes $0.5000 $2.0000 0.00%
Qwen_Qwen36_27B Venice $0.3250 $3.2500 0.00%
OpenAI_gpt-oss-120b Google Vertex $0.0900 $0.3590 4.10%
OpenAI_gpt-oss-120b DeepInfra $0.0390 $0.1890 0.00%
OpenAI_gpt-oss-120b Groq $0.1200 $0.5990 40.00%
OpenAI_gpt-oss-120b Cerebras $0.3500 $0.7490 48.60%
OpenAI_gpt-oss-120b DekaLLM $0.0390 $0.1770 0.00%
OpenAI_gpt-oss-120b Baseten $0.1000 $0.4990 52.10%
OpenAI_gpt-oss-120b NovitaAI $0.0500 $0.2490 2.50%
OpenAI_gpt-oss-120b Ambient $0.1070 $0.6000 57.00%
OpenAI_gpt-oss-120b DeepInfra (Turbo) $0.1500 $0.5990 0.00%
OpenAI_gpt-oss-120b Parasail $0.0890 $0.7490 23.60%
OpenAI_gpt-oss-120b SiliconFlow $0.0500 $0.4490 0.00%
OpenAI_gpt-oss-120b Amazon Bedrock (1) $0.1500 $0.5990 0.00%
OpenAI_gpt-oss-120b Nebius Token Factory $0.1500 $0.5990 0.00%
OpenAI_gpt-oss-120b SambaNova Dedicated $0.1200 $0.8990 0.00%
OpenAI_gpt-oss-120b SambaNova $0.1400 $0.9490 0.00%
OpenAI_gpt-oss-120b Together $0.1500 $0.6000 0.00%
OpenAI_gpt-oss-120b Phala $0.1000 $0.4890 14.00%
OpenAI_gpt-oss-120b MARA $0.1500 $0.7490 0.00%
OpenAI_gpt-oss-120b Weights & Biases $0.1500 $0.5990 12.70%
OpenAI_gpt-oss-120b Amazon Bedrock (2) $0.1500 $0.5990 0.00%
Zai_GLM_47_Flash DeepInfra $0.0230 $0.3990 73.40%
Zai_GLM_47_Flash NovitaAI $0.0610 $0.3990 14.30%
Zai_GLM_47_Flash Cloudflare $0.0600 $0.3970 0.00%
Zai_GLM_47_Flash Phala $0.1000 $0.4290 0.00%
Zai_GLM_47_Flash Z.ai $0.0430 $0.3990 44.20%
Zai_GLM_47_Flash Venice $0.1250 $0.4960 14.40%
OpenAI_GPT-51 OpenAI $0.8220 $10.0000 38.10%
OpenAI_GPT-51 Azure (1) $0.9550 $10.0000 26.30%
OpenAI_GPT-54_Mini OpenAI $0.3370 $4.5110 61.50%
OpenAI_GPT-54_Mini Azure $0.6840 $4.4990 9.80%
Meta_Llama_31_8B_Instruct Groq $0.0330 $0.0750 67.80%
Meta_Llama_31_8B_Instruct DeepInfra $0.0200 $0.0480 0.00%
Meta_Llama_31_8B_Instruct NovitaAI $0.0200 $0.0480 0.00%
Meta_Llama_31_8B_Instruct Cerebras $0.0990 $0.0950 87.50%
Meta_Llama_31_8B_Instruct Cloudflare $0.1520 $0.2850 0.00%
Qwen_Qwen35-9B DeepInfra $0.0400 $0.1480 0.00%
Qwen_Qwen35-9B Together $0.1000 $0.1490 0.00%
Qwen_Qwen35-9B SiliconFlow $0.1000 $0.1490 0.00%
Qwen_Qwen35-9B Venice $0.1000 $0.1490 35.50%
MoonshotAI_Kimi_K25 DeepInfra $0.1660 $2.2490 74.80%
MoonshotAI_Kimi_K25 NovitaAI $0.2320 $2.8490 71.10%
MoonshotAI_Kimi_K25 ModelRun $0.1610 $1.9000 77.10%
MoonshotAI_Kimi_K25 Moonshot AI $0.1360 $3.0000 92.80%
MoonshotAI_Kimi_K25 Fireworks $0.1960 $3.0000 80.70%
MoonshotAI_Kimi_K25 Chutes $0.3790 $2.0000 27.70%
MoonshotAI_Kimi_K25 AtlasCloud $0.3790 $2.4990 38.10%
MoonshotAI_Kimi_K25 SiliconFlow $0.2590 $2.2500 50.10%
MoonshotAI_Kimi_K25 Cloudflare $0.3390 $3.0000 52.20%
MoonshotAI_Kimi_K25 Parasail $0.3810 $2.7990 54.80%
MoonshotAI_Kimi_K25 Phala $0.6000 $3.0000 2.20%
MoonshotAI_Kimi_K25 Venice $0.5300 $3.5000 8.90%
Tencent_Hy3_preview SiliconFlow $0.0350 $0.2590 84.30%
OpenAI_GPT-53-Codex OpenAI $0.2820 $14.0000 93.20%
OpenAI_GPT-53-Codex Azure $0.4470 $14.0000 82.70%
StepFun_Step_35_Flash StepFun $0.0310 $0.2990 86.10%
StepFun_Step_35_Flash DeepInfra $0.0900 $0.2990 0.00%
StepFun_Step_35_Flash SiliconFlow $0.1000 $0.3000 0.00%
OpenAI_GPT-54_Nano OpenAI $0.0870 $1.2490 62.80%
OpenAI_GPT-54_Nano Azure $0.1530 $1.2490 26.10%
DeepSeek_DeepSeek_V3_0324 NovitaAI $0.1800 $1.1180 66.30%
DeepSeek_DeepSeek_V3_0324 DeepInfra $0.1600 $0.7670 61.90%
DeepSeek_DeepSeek_V3_0324 ModelRun $0.1850 $0.7980 50.40%
DeepSeek_DeepSeek_V3_0324 SiliconFlow $0.2500 $1.0000 52.60%
DeepSeek_DeepSeek_V3_0324 AtlasCloud $0.2140 $0.8770 4.00%
DeepSeek_DeepSeek_V3_0324 GMICloud $0.2890 $1.1350 0.60%
Qwen_Qwen35_397B_A17B Morph $0.4220 $3.5000 64.70%
Qwen_Qwen35_397B_A17B Alibaba Cloud Int. $0.3900 $2.3400 0.00%
Qwen_Qwen35_397B_A17B Chutes $0.2710 $3.0000 79.60%
Qwen_Qwen35_397B_A17B DeepInfra $0.4900 $3.5990 0.00%
Qwen_Qwen35_397B_A17B Together $0.6000 $3.6000 48.40%
Qwen_Qwen35_397B_A17B NovitaAI $0.6000 $3.6000 5.10%
Qwen_Qwen35_397B_A17B Nebius Token Factory $0.6000 $3.5980 0.00%
Qwen_Qwen35_397B_A17B AtlasCloud $0.5500 $3.5000 0.00%
Qwen_Qwen35_397B_A17B Phala $0.5500 $3.5000 35.50%
Qwen_Qwen35_397B_A17B Parasail $0.4090 $3.6000 45.30%
Qwen_Qwen35_397B_A17B GMICloud $0.6000 $3.6000 0.00%
Qwen_Qwen35_397B_A17B Venice $0.7500 $4.5000 19.30%
Mistral_Mistral_Small_32_24B Mistral $0.0860 $0.2990 15.90%
Mistral_Mistral_Small_32_24B DeepInfra $0.0750 $0.1980 0.00%
Mistral_Mistral_Small_32_24B Parasail $0.0730 $0.5980 41.90%
Mistral_Mistral_Small_32_24B Venice $0.0940 $0.2490 0.00%
Meta_Llama_4_Maverick Parasail $0.3140 $1.0000 19.30%
Meta_Llama_4_Maverick DeepInfra $0.1500 $0.5970 0.00%
Meta_Llama_4_Maverick NovitaAI $0.2700 $0.8460 0.00%
Meta_Llama_4_Maverick SambaNova $0.6300 $1.7970 0.00%
Mistral_Mistral_Medium_35 Mistral $1.5000 $7.4990 21.70%
Qwen_Qwen36_Flash Alibaba Cloud Int. $0.1920 $1.1400 0.30%
OpenAI_GPT-41 OpenAI $1.1000 $8.0000 60.00%
OpenAI_GPT-41 Azure (1) $1.0780 $8.0000 61.50%
Mistral_Mistral_Nemo DeepInfra $0.0200 $0.0370 0.00%
Mistral_Mistral_Nemo DekaLLM $0.0200 $0.0250 0.00%
Mistral_Mistral_Nemo Mistral $0.0900 $0.1430 44.40%
Mistral_Mistral_Nemo NovitaAI $0.0390 $0.1640 0.00%
Google_Gemma_4_31B DeepInfra Turbo $0.1200 $0.3690 0.00%
Google_Gemma_4_31B DeepInfra $0.1300 $0.3790 0.00%
Google_Gemma_4_31B NovitaAI $0.1400 $0.3990 5.80%
Google_Gemma_4_31B Chutes $0.1090 $0.3780 31.80%
Google_Gemma_4_31B Together (2) $0.3900 $0.9690 0.00%
Google_Gemma_4_31B SiliconFlow $0.1300 $0.3990 0.00%
Google_Gemma_4_31B Together (1) $0.2800 $0.8580 0.00%
Google_Gemma_4_31B Ambient $0.1180 $0.3990 18.40%
Google_Gemma_4_31B Venice $0.1750 $0.5000 12.00%
Google_Gemma_4_31B Parasail $0.1370 $0.3990 6.40%
Google_Gemini_25_Flash_Lite Google Vertex (EU) $0.0800 $0.3990 22.70%
Google_Gemini_25_Flash_Lite Google Vertex $0.0940 $0.3990 7.40%
Google_Gemini_25_Flash_Lite Google AI Studio $0.0910 $0.3990 12.00%
Google_Gemini_35_Flash Google Vertex $0.5520 $9.0080 70.30%
Google_Gemini_35_Flash Google AI Studio $0.6340 $7.9040 63.50%
DeepSeek_DeepSeek_V31 Weights & Biases $0.5500 $1.6370 31.80%
DeepSeek_DeepSeek_V31 NovitaAI $0.2620 $1.0000 5.70%
DeepSeek_DeepSeek_V31 DeepInfra $0.1750 $0.7880 43.20%
DeepSeek_DeepSeek_V31 SiliconFlow $0.2700 $1.0000 36.80%
DeepSeek_DeepSeek_V31 AtlasCloud $0.2870 $0.9470 7.30%
DeepSeek_DeepSeek_V31 Google Vertex $0.6000 $1.7000 19.30%
DeepSeek_DeepSeek_V31 SambaNova $0.6500 $1.4990 0.00%
Google_Gemma_4_26B_A4B Google Vertex $0.1500 $0.5970 19.10%
Google_Gemma_4_26B_A4B NovitaAI $0.1300 $0.4000 16.80%
Google_Gemma_4_26B_A4B NextBit $0.1180 $0.3980 63.40%
Google_Gemma_4_26B_A4B DeepInfra $0.0700 $0.3370 0.00%
Google_Gemma_4_26B_A4B Cloudflare $0.1000 $0.2970 0.00%
Google_Gemma_4_26B_A4B Parasail $0.0960 $0.3980 41.80%
Google_Gemma_4_26B_A4B DekaLLM $0.0600 $0.3270 0.00%
Google_Gemma_4_26B_A4B SiliconFlow $0.1200 $0.3970 0.00%
Google_Gemma_4_26B_A4B Venice $0.1620 $0.4990 74.60%
Google_Gemma_4_26B_A4B io.net $0.1500 $0.5000 0.00%
OpenAI_gpt-oss-20b Weights & Biases $0.0500 $0.2000 69.00%
OpenAI_gpt-oss-20b DeepInfra $0.0300 $0.1400 0.00%
| OpenAI_gpt-oss-20b | Amazon Bedrock (2) | $0.0700 | $0.1500 | 0.00% |
| OpenAI_gpt-oss-20b | Amazon Bedrock (1) | $0.0700 | $0.1500 | 0.00% |

| OpenAI_gpt-oss-20b | NovitaAI | $0.0400 | $0.1500 | 0.00% | | OpenAI_gpt-oss-20b | Google Vertex | $0.0690 | $0.2460 | 63.20% | | OpenAI_gpt-oss-20b | Groq | $0.0660 | $0.2990 | 22.40% | | OpenAI_gpt-oss-20b | Parasail | $0.0250 | $0.1990 | 71.40% | | OpenAI_gpt-oss-20b | SiliconFlow | $0.0400 | $0.1790 | 0.00% | | OpenAI_gpt-oss-20b | Fireworks | $0.0580 | $0.2990 | 33.70% | | OpenAI_gpt-oss-20b | NextBit | $0.1000 | $0.4490 | 0.00% | | OpenAI_gpt-oss-20b | Together | $0.0490 | $0.1990 | 0.00% | | OpenAI_GPT-41_Mini | OpenAI | $0.2800 | $1.5940 | 40.00% | | OpenAI_GPT-41_Mini | Azure (2) | $0.2520 | $1.5980 | 49.30% | | OpenAI_GPT-41_Mini | Azure (1) | $0.2020 | $1.5930 | 66.00% | | DeepSeek_DeepSeek_V32 | NovitaAI | $0.2490 | $0.3990 | 15.20% | | DeepSeek_DeepSeek_V32 | Baidu Qianfan | $0.1050 | $0.3760 | 64.90% | | DeepSeek_DeepSeek_V32 | SiliconFlow | $0.1720 | $0.4190 | 69.80% | | DeepSeek_DeepSeek_V32 | DeepInfra | $0.1950 | $0.3790 | 49.60% | | DeepSeek_DeepSeek_V32 | AtlasCloud | $0.2410 | $0.3790 | 14.40% | | DeepSeek_DeepSeek_V32 | Friendli | $0.3990 | $1.5000 | 40.50% | | DeepSeek_DeepSeek_V32 | Alibaba Cloud Int. | $0.2730 | $1.1110 | 32.50% | | DeepSeek_DeepSeek_V32 | Parasail | $0.2540 | $0.4470 | 17.20% | | DeepSeek_DeepSeek_V32 | Google Vertex | $0.5600 | $1.6790 | 8.00% | | Google_Gemini_20_Flash_Lite | Google Vertex | $0.0750 | $0.2980 | 0.00% | | Google_Gemini_20_Flash_Lite | Google AI Studio | $0.0750 | $0.2950 | 0.00% | | Anthropic_Claude_Sonnet_4 | Amazon Bedrock (1) | $1.6300 | $15.0000 | 53.70% | | Anthropic_Claude_Sonnet_4 | Anthropic | $1.9410 | $15.0000 | 41.60% | | Anthropic_Claude_Sonnet_4 | Amazon Bedrock (2) | $2.1510 | $15.0000 | 35.10% | | Anthropic_Claude_Sonnet_4 | Google Vertex (Global) | $2.3880 | $15.0000 | 29.50% | | Anthropic_Claude_Sonnet_4 | Google Vertex (Europe) | $1.7360 | $15.0000 | 58.40% | | Anthropic_Claude_Haiku_45 | Amazon Bedrock (Global) | $0.3670 | $5.0000 | 72.40% | | Anthropic_Claude_Haiku_45 | Anthropic | $0.5420 | $5.0000 | 54.80% | | Anthropic_Claude_Haiku_45 | Google Vertex (Europe) | $0.9760 | $5.0000 | 2.80% | | Anthropic_Claude_Haiku_45 | Amazon Bedrock | $0.5780 | $5.0000 | 48.70% | | Anthropic_Claude_Haiku_45 | Google Vertex | $0.4890 | $5.0000 | 58.20% | | OpenAI_GPT-54 | Azure | $1.8150 | $15.0000 | 30.40% | | OpenAI_GPT-54 | OpenAI | $0.8980 | $15.1190 | 74.90% | | OpenAI_GPT-5 | OpenAI | $0.4550 | $10.0000 | 70.60% | | OpenAI_GPT-5 | Azure (1) | $1.2500 | $10.0000 | 0.00% | | Xiaomi_MiMo-V2-Flash | Xiaomi | $0.0500 | $0.2980 | 55.00% | | Xiaomi_MiMo-V2-Flash | NovitaAI | $0.0530 | $0.2990 | 58.70% | | Qwen_Qwen35-35B-A3B | Parasail | $0.0910 | $1.0000 | 59.00% | | Qwen_Qwen35-35B-A3B | Alibaba Cloud Int. | $0.1620 | $1.2990 | 0.00% | | Qwen_Qwen35-35B-A3B | AkashML | $0.1600 | $1.2000 | 0.00% | | Qwen_Qwen35-35B-A3B | Ambient | $0.1020 | $1.0000 | 41.60% | | Qwen_Qwen35-35B-A3B | Venice | $0.1970 | $1.2500 | 73.90% | | Qwen_Qwen35-35B-A3B | DeepInfra | $0.1400 | $1.0000 | 0.00% | | Qwen_Qwen35-35B-A3B | DekaLLM | $0.1390 | $1.0000 | 0.00% | | Qwen_Qwen35-35B-A3B | AtlasCloud | $0.2250 | $1.8000 | 0.00% | | Qwen_Qwen35-35B-A3B | NextBit | $0.3000 | $1.8000 | 0.00% | | Qwen_Qwen35-35B-A3B | SiliconFlow | $0.2400 | $1.8000 | 0.00% | | Google_Gemini_25_Pro | Google Vertex (Global) | $1.0410 | $10.0650 | 30.10% | | Google_Gemini_25_Pro | Google Vertex (EU) | $0.9680 | $10.0000 | 26.40% | | Google_Gemini_25_Pro | Google AI Studio | $0.9680 | $10.0000 | 35.40% | | Google_Gemini_25_Pro | Google Vertex (US) | $1.2200 | $10.0000 | 2.60% | | Zai_GLM_51 | StreamLake | $0.3230 | $3.9590 | 91.30% | | Zai_GLM_51 | Friendli | $0.6440 | $4.4000 | 66.30% | | Zai_GLM_51 | Z.ai | $0.5660 | $4.3990 | 73.10% | | Zai_GLM_51 | Chutes | $0.9960 | $4.0000 | 34.00% | | Zai_GLM_51 | AtlasCloud | $0.5020 | $4.4000 | 78.50% | | Zai_GLM_51 | DeepInfra | $0.4450 | $3.5000 | 71.60% | | Zai_GLM_51 | SiliconFlow | $0.5070 | $4.3990 | 78.30% | | Zai_GLM_51 | NovitaAI | $0.5120 | $4.4000 | 77.50% | | Zai_GLM_51 | Baidu Qianfan | $0.5880 | $3.0790 | 49.10% | | Zai_GLM_51 | Baseten | $1.3000 | $4.3000 | 45.60% | | Zai_GLM_51 | Inceptron | $1.0810 | $4.4000 | 28.00% | | Zai_GLM_51 | Together | $1.4000 | $4.4000 | 6.50% | | Zai_GLM_51 | Parasail | $1.1110 | $4.4000 | 25.30% | | Zai_GLM_51 | Ambient | $1.4000 | $4.4000 | 14.00% | | Zai_GLM_51 | Phala | $1.2100 | $4.2000 | 0.70% | | Zai_GLM_51 | io.net | $1.2900 | $4.4800 | 0.00% | | Zai_GLM_51 | Fireworks | $1.2730 | $4.3990 | 11.10% | | Zai_GLM_51 | Venice | $1.7470 | $5.5000 | 0.20% | | Zai_GLM_51 | GMICloud | $0.0000 | $0.0000 | 0.00% | | Zai_GLM_47 | Z.ai | $0.1170 | $2.1990 | 98.60% | | Zai_GLM_47 | Google Vertex | $0.6000 | $2.2000 | 36.50% | | Zai_GLM_47 | SiliconFlow | $0.2040 | $2.2000 | 72.30% | | Zai_GLM_47 | DeepInfra | $0.2180 | $1.7500 | 56.90% | | Zai_GLM_47 | Cerebras | $2.2500 | $2.7500 | 45.60% | | Zai_GLM_47 | AtlasCloud | $0.3220 | $1.8500 | 49.60% | | Zai_GLM_47 | Phala | $0.8500 | $3.2990 | 3.50% | | Zai_GLM_47 | NovitaAI | $0.3700 | $2.0050 | 40.80% | | Zai_GLM_47 | Parasail | $0.4270 | $2.1000 | 6.70% | | Zai_GLM_47 | Venice | $0.5490 | $2.6490 | 0.30% | | OpenAI_GPT-55 | OpenAI | $1.1170 | $30.6400 | 92.70% | | OpenAI_GPT-55 | Azure | $2.3150 | $30.0820 | 62.00% | | Google_Gemini_31_Flash_Lite | Google Vertex | $0.2100 | $1.4980 | 19.20% | | Google_Gemini_31_Flash_Lite | Google AI Studio | $0.1860 | $1.4680 | 27.40% | | Qwen_Qwen3_Coder_480B_A35B | DeepInfra (Turbo) | $0.1210 | $1.0000 | 89.30% | | Qwen_Qwen3_Coder_480B_A35B | Together | $2.0000 | $2.0000 | 0.00% | | Qwen_Qwen3_Coder_480B_A35B | Google Vertex | $0.2200 | $1.7990 | 49.20% | | Qwen_Qwen3_Coder_480B_A35B | AtlasCloud | $0.7800 | $3.7980 | 0.00% | | Qwen_Qwen3_Coder_480B_A35B | Weights & Biases | $1.0000 | $1.4980 | 70.50% | | Qwen_Qwen3_Coder_480B_A35B | Alibaba OpenSource | $1.3730 | $6.8700 | 0.00% | | Qwen_Qwen3_Coder_480B_A35B | NovitaAI | $0.3800 | $1.5490 | 0.00% | | Qwen_Qwen3_Coder_480B_A35B | Venice | $0.3500 | $1.5000 | 0.00% | | Google_Gemini_3_Flash_Preview | Google Vertex | $0.3600 | $2.9970 | 32.00% | | Google_Gemini_3_Flash_Preview | Google AI Studio | $0.3480 | $2.9980 | 34.50% | | Qwen_Qwen36_Plus | Alibaba Cloud Int. | $0.6690 | $1.9820 | 6.00% | | Mistral_Mistral_Small_4 | Mistral | $0.1280 | $0.5990 | 16.10% | | Mistral_Mistral_Small_4 | Venice | $0.1870 | $0.7490 | 11.20% | | MoonshotAI_Kimi_K26 | SiliconFlow | $0.2920 | $4.0000 | 83.90% | | MoonshotAI_Kimi_K26 | Moonshot AI | $0.3430 | $4.0000 | 76.80% | | MoonshotAI_Kimi_K26 | Cloudflare | $0.6870 | $3.5000 | 13.60% | | MoonshotAI_Kimi_K26 | Inceptron | $0.3360 | $3.5000 | 76.60% | | MoonshotAI_Kimi_K26 | Weights & Biases | $0.5520 | $4.0000 | 50.30% | | MoonshotAI_Kimi_K26 | NovitaAI | $0.2850 | $3.4000 | 80.50% | | MoonshotAI_Kimi_K26 | Chutes | $0.4410 | $3.5000 | 80.80% | | MoonshotAI_Kimi_K26 | Fireworks | $0.4160 | $4.0000 | 67.60% | | MoonshotAI_Kimi_K26 | Together | $0.4250 | $4.5000 | 77.40% | | MoonshotAI_Kimi_K26 | Parasail | $0.3120 | $3.5000 | 74.20% | | MoonshotAI_Kimi_K26 | io.net | $0.7300 | $3.4900 | 0.00% | | MoonshotAI_Kimi_K26 | AkashML | $0.9500 | $4.0000 | 0.00% | | MoonshotAI_Kimi_K26 | DeepInfra | $0.3220 | $3.5000 | 71.30% | | MoonshotAI_Kimi_K26 | AtlasCloud | $0.6370 | $4.0000 | 39.70% | | MoonshotAI_Kimi_K26 | Nebius Token Factory | $0.9500 | $4.0000 | 0.00% | | MoonshotAI_Kimi_K26 | StreamLake | $0.2760 | $3.8000 | 83.50% | | MoonshotAI_Kimi_K26 | Phala | $1.0900 | $4.6000 | 6.10% | | MoonshotAI_Kimi_K26 | Venice | $0.7270 | $4.6550 | 19.50% | | Qwen_Qwen35-Flash | Alibaba Cloud Int. | $0.0650 | $0.2590 | 0.00% | | OpenAI_GPT-5_Nano | OpenAI | $0.0380 | $0.3960 | 25.10% | | OpenAI_GPT-5_Nano | Azure (1) | $0.0350 | $0.3990 | 36.50% | | DeepSeek_DeepSeek_V4_Flash | DeepSeek | $0.0220 | $0.2800 | 86.10% | | DeepSeek_DeepSeek_V4_Flash | SiliconFlow | $0.0890 | $0.2790 | 45.60% | | DeepSeek_DeepSeek_V4_Flash | Alibaba Cloud Int. | $0.0720 | $0.2790 | 61.00% | | DeepSeek_DeepSeek_V4_Flash | NovitaAI | $0.0720 | $0.2790 | 60.60% | | DeepSeek_DeepSeek_V4_Flash | Parasail | $0.1300 | $0.2790 | 14.40% | | DeepSeek_DeepSeek_V4_Flash | AtlasCloud | $0.0870 | $0.2790 | 47.40% | | DeepSeek_DeepSeek_V4_Flash | DeepInfra | $0.0790 | $0.1990 | 25.70% | | DeepSeek_DeepSeek_V4_Flash | GMICloud | $0.0690 | $0.2230 | 48.20% | | DeepSeek_DeepSeek_V4_Flash | Baidu Qianfan | $0.0810 | $0.2510 | 44.40% | | DeepSeek_DeepSeek_V4_Flash | AkashML | $0.1400 | $0.2790 | 0.00% | | DeepSeek_DeepSeek_V4_Flash | Venice | $0.1510 | $0.3490 | 13.60% | | OpenAI_GPT-52 | OpenAI | $0.9490 | $14.0000 | 50.90% | | OpenAI_GPT-52 | Azure | $1.0480 | $14.0000 | 44.60% | | Google_Gemini_25_Flash | Google Vertex (EU) | $0.2350 | $2.4990 | 25.30% | | Google_Gemini_25_Flash | Google Vertex (Global) | $0.2330 | $2.4990 | 27.50% | | Google_Gemini_25_Flash | Google AI Studio | $0.1770 | $2.4990 | 47.80% | | Google_Gemini_25_Flash | Google Vertex | $0.1780 | $2.5000 | 45.10% | | Google_Gemini_31_Pro_Preview | Google Vertex | $1.4470 | $12.0080 | 37.30% | | Google_Gemini_31_Pro_Preview | Google AI Studio | $1.5670 | $11.9550 | 24.20% | | OpenAI_GPT-5_Mini | OpenAI | $0.1330 | $1.9960 | 50.70% | | OpenAI_GPT-5_Mini | Azure (1) | $0.0900 | $2.0000 | 72.80% | | Xiaomi_MiMo-V25 | Xiaomi | $0.1830 | $2.0570 | 74.20% | | Owl_Alpha | Stealth | $0.0000 | $0.0000 | 62.60% | | DeepSeek_DeepSeek_V4_Pro | SiliconFlow | $0.7450 | $3.4790 | 59.30% | | DeepSeek_DeepSeek_V4_Pro | DeepSeek | $0.0560 | $0.8690 | 87.90% | | DeepSeek_DeepSeek_V4_Pro | NovitaAI | $0.4060 | $3.3790 | 82.00% | | DeepSeek_DeepSeek_V4_Pro | Alibaba Cloud Int. | $0.9230 | $3.3600 | 49.10% | | DeepSeek_DeepSeek_V4_Pro | GMICloud | $0.2460 | $2.7830 | 89.80% | | DeepSeek_DeepSeek_V4_Pro | DeepInfra | $0.7350 | $2.5990 | 47.10% | | DeepSeek_DeepSeek_V4_Pro | Baidu Qianfan | $0.8920 | $3.0410 | 45.10% | | DeepSeek_DeepSeek_V4_Pro | AtlasCloud | $1.1450 | $3.3800 | 34.50% | | DeepSeek_DeepSeek_V4_Pro | Parasail | $1.7220 | $3.4780 | 2.10% | | DeepSeek_DeepSeek_V4_Pro | Fireworks | $1.3190 | $3.4800 | 26.40% | | DeepSeek_DeepSeek_V4_Pro | Together | $1.4570 | $4.4000 | 33.80% | | DeepSeek_DeepSeek_V4_Pro | Venice | $0.7990 | $3.7950 | 66.50% | | OpenAI_GPT-4o-mini | OpenAI | $0.1440 | $0.5970 | 8.00% |

| OpenAI_GPT-4o-mini | Azure (1) | $0.1260 | $0.5980 | 32.00% |
| OpenAI_GPT-4o-mini | Azure (2) | $0.1250 | $0.5980 | 33.50% |

| Google_Gemini_31_Flash_Lite_Preview | Google AI Studio | $0.1640 | $1.4990 | 38.30% | | Google_Gemini_31_Flash_Lite_Preview | Google Vertex | $0.2300 | $1.4990 | 10.40% | | Zai_GLM_45_Air | Z.ai | $0.0750 | $1.0960 | 73.50% | | Zai_GLM_45_Air | NovitaAI | $0.0540 | $0.8460 | 72.30% | | Zai_GLM_45_Air | SiliconFlow | $0.1400 | $0.8580 | 42.40% | | Anthropic_Claude_Sonnet_45 | Amazon Bedrock (1) | $2.2850 | $15.0010 | 30.40% | | Anthropic_Claude_Sonnet_45 | Google Vertex (Global) | $1.6100 | $15.0000 | 55.10% | | Anthropic_Claude_Sonnet_45 | Claude Platform on AWS | $0.9770 | $15.0000 | 77.80% | | Anthropic_Claude_Sonnet_45 | Anthropic | $3.0210 | $15.0000 | 5.40% | | Anthropic_Claude_Sonnet_45 | Google Vertex | $0.9840 | $15.0000 | 78.40% | | Anthropic_Claude_Sonnet_45 | Amazon Bedrock (2) | $1.7740 | $15.0000 | 48.70% | | MiniMax_MiniMax_M25 | DeepInfra | $0.0750 | $1.1480 | 62.50% | | MiniMax_MiniMax_M25 | MiniMax Highspeed | $0.1800 | $2.3990 | 77.80% | | MiniMax_MiniMax_M25 | Inceptron | $0.0740 | $0.8980 | 78.80% | | MiniMax_MiniMax_M25 | MiniMax | $0.0700 | $1.1990 | 85.30% | | MiniMax_MiniMax_M25 | AtlasCloud | $0.1110 | $1.1980 | 78.20% | | MiniMax_MiniMax_M25 | NovitaAI | $0.0710 | $1.1970 | 84.90% | | MiniMax_MiniMax_M25 | Chutes | $0.1080 | $1.1990 | 56.30% | | MiniMax_MiniMax_M25 | Baidu Qianfan | $0.1170 | $1.0790 | 62.90% | | MiniMax_MiniMax_M25 | Friendli | $0.1090 | $1.1990 | 79.50% | | MiniMax_MiniMax_M25 | AkashML | $0.1500 | $1.1490 | 0.00% | | MiniMax_MiniMax_M25 | Parasail | $0.1310 | $1.1990 | 62.60% | | MiniMax_MiniMax_M25 | MARA | $0.3000 | $1.2000 | 0.00% | | MiniMax_MiniMax_M25 | SiliconFlow | $0.1580 | $1.1990 | 52.60% | | MiniMax_MiniMax_M25 | Phala | $0.2000 | $1.3790 | 32.90% | | MiniMax_MiniMax_M25 | Weights & Biases | $0.2990 | $1.1960 | 24.20% | | MiniMax_MiniMax_M25 | StreamLake | $0.3050 | $1.1990 | 6.80% | | MiniMax_MiniMax_M25 | Venice | $0.1480 | $1.1880 | 63.80% | | Anthropic_Claude_Opus_45 | Google Vertex | $4.0740 | $25.0000 | 36.70% | | Anthropic_Claude_Opus_45 | Amazon Bedrock (1) | $0.9350 | $25.0000 | 91.90% | | Anthropic_Claude_Opus_45 | Amazon Bedrock (2) | $3.3230 | $25.0000 | 41.30% | | Anthropic_Claude_Opus_45 | Anthropic | $4.4120 | $25.0000 | 23.00% | | Anthropic_Claude_Opus_45 | Claude Platform on AWS | $1.7950 | $25.0000 | 74.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Weights & Biases | $0.1000 | $0.0950 | 32.30% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | NovitaAI | $0.0900 | $0.5770 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | DeepInfra | $0.0710 | $0.0970 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Parasail | $0.0780 | $0.5990 | 43.10% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Alibaba Cloud Int. | $0.1490 | $0.5630 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Google Vertex (2) | $0.2200 | $0.8790 | 28.40% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Cerebras | $0.6000 | $1.1950 | 69.90% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Google Vertex (1) | $0.2500 | $1.0000 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Friendli | $0.2000 | $0.7980 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | Together | $0.2000 | $0.5980 | 0.00% | | Qwen_Qwen3_235B_A22B_Instruct_2507 | AtlasCloud | $0.2000 | $0.8760 | 0.00% | | OpenAI_GPT-51_Chat | OpenAI | $0.6280 | $10.0000 | 55.30% | | Google_Gemini_20_Flash | Google Vertex | $0.0980 | $0.3990 | 5.40% | | Google_Gemini_20_Flash | Google AI Studio | $0.1010 | $0.3990 | 0.60% |

Opinion #

The "cheap" providers aren't always cheap. Besides that, one thing we did not discuss in this essay is the constant price increases from almost every provider. The numbers look worse when you consider the inference has been getting more expensive too. It makes increasingly more sense to move to a hybrid first (with a smaller local model) and then fully local setup eventually for coding agents.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/cache-hit-rates-of-i…] indexed:0 read:28min 2026-05-31 ·