Cache hit rates of Inference are more meaningful than the headline costs An analysis of 60+ AI inference providers using 398 data points reveals that cache hit rates, not headline input costs, are the primary driver of real-world pricing for agentic workflows. Chinese labs including DeepSeek (87%), StepFun (86.1%), and Moonshot AI (84.8%) achieve the highest cache hit rates, while providers like io.net, AkashML, SambaNova, and Nebius record 0.0% cache hit rates, making them significantly more expensive for multi-turn conversations. The findings show that Google's own Gemini models on Vertex AI have lower cache hit rates (37.30%) than competitor Claude models hosted on the same platform (65.30%), suggesting architectural inefficiencies in Google's approach. Tl;Dr: Agents push the full conversation history into context every turn; hence, over a large number of turns, they are extremely read heavy, which in turn is why cache hit rates are an important factor. This post is an analysis of 60+ providers and their cache hit rates using 398 data points. All data sourced from openrouter.ai model pages. This post assumes the reader is familiar with Prefix Caching and all mentions of Caching in this post refer to Prefix Caching. Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher. Agentic workflows are different from most human-LLM conversations in one key characteristic: the number of turns on average are far higher. Context processing over multi-turn conversation grows quadratically. Every turn passes the full conversation up until that point into context along with its own input, e.g. turn 100 will push everything up to turn 99 again in the context window. The LLM, on its end, will try to match the longest sequence it can to the caches it has available and build the rest of the conversation as newly input tokens, typically at 10x the price. So on a long 200k context conversation which btw is a bad idea capability-wise even if you don't care about costs , if the model that costs $5 per million input tokens fails to hit any cache, you'll be charged $1 just for the input processing. Two things determine this: - 1. Cached input pricing - the headline metric everyone looks at. - 2. Cache hit rate - the hidden variable that nobody talks about. I recently spent significant hours thinking there was a bug in Dirac that caused caches to break, only to find in the end that it was entirely due to Gemini Flash 3's cache hit rate. While trying to look up the data on this, I found that OpenRouter fortunately publishes this data go to model's page and look for 'Effective Pricing' section . Since the data is hourly, we have to assume that it doesn't change too much hour to hour. Provider Cache-hit Tier list Providers with multiple endpoints e.g. Amazon Bedrock US, Bedrock Global, Bedrock 1 are listed separately — each entry reflects the hit rate of that specific endpoint as observed. DeepSeek remains the gold standard of caching, which probably doesn't surprise anyone who has used their official API. In fact, all S-tier entries hitting 75%+ cache rates are Chinese labs: DeepSeek 87% , StepFun 86.1% , Moonshot AI 84.8% , MiniMax 75.4% , and Xiaomi 74.7% . The mainstream US labs place somewhere in the middle but as we will see in the next section, the variance is huge and rather interesting. On the flip side, we have the "F-Tier". Providers like io.net, AkashML, SambaNova, and Nebius are clocking in at exactly 0.0% cache hit rates across the models. US closed-source big 3 The most interesting thing to me from the chart above is, older models from the same provider tend to get lower cache hit rates. If I had to explain it non-cynically, I would guess that from systems engineering POV, it's probably the cache-pool sizes allocated to each model. Google does worse than the two other providers across the board, especially considering that they own the full stack on TPUs. This gets full-on clowny when you look at the Vertex AI numbers see table below - Opus 4.7 on Vertex AI has 65.30% cache hit rate while Google's own Google Gemini 3.1 Pro Preview has 37.30% and this trend applies to all Claude vs Gemini hosted on Vertex How do you manage to get a lower cache hit rate on your own hardware with your model trained on that hardware than a competitor's model? If I was to speculate, I'd guess that the whole 'thought signature' architecture is just not working out. OSS Models Cost Comparison OSS models, depending on who you use them from, make a huge difference, mostly due to cache-hit rates. | Model | Cheapest effectiveInputPrice Provider | Most Expensive effectiveInputPrice Provider | Difference Percentage | |---|---|---|---| | Kimi K2.6 | 0.2760 StreamLake | 1.0900 Phala | 0.8140 294.93% | | MiMo-V2.5-Pro | 0.3720 Xiaomi | 0.9060 DeepInfra | 0.5340 143.55% | | DeepSeek V4 Pro Max | 0.0560 DeepSeek | 1.7220 Parasail | 1.6660 2975.00% | | GLM-5.1 | 0.3230 StreamLake | 1.7470 Venice | 1.4240 440.87% | | MiniMax-M2.7 | 0.1430 MiniMax | 0.6000 SambaNova | 0.4570 319.58% | Small Model Grift Now onto smaller models that seem instinctively cheaper. Below is the average effective pricing for 4 of the most popular local models hi r/localLlama | Model Name | Total Providers | Avg Eff. Input | Avg Eff. Output | Avg Cache Hit | |---|---|---|---|---| | Google Gemma 4 26B A4B | 10 | $0.1156 | $0.4150 | 21.57% | | Google Gemma 4 31B | 10 | $0.1729 | $0.5049 | 7.44% | | Qwen Qwen36 35B A3B | 7 | $0.1643 | $1.1450 | 11.54% | | Qwen Qwen36 27B | 8 | $0.4096 | $2.9433 | 7.66% | Compare that to | Model Name Official API | Eff. Input Price | Eff. Output Price | Cache Hit Rate | |---|---|---|---| | DeepSeek DeepSeek V4 Pro | $0.0560 | $0.8690 | 87.90% | | DeepSeek DeepSeek V4 Flash | $0.0220 | $0.2800 | 86.10% | Yup, you can use DeepSeek V4 Pro , a 1.6 Trillion parameter model whose active 49B parameters are higher than the total parameters of any small model, for cheaper than you can use either of Qwen3.6 models. Thanks to the providers like io.net and DeepInfra offering $0.32/$3.20 input output pricing with 0 caching. The full table | Model Name | Provider | Eff. Input Price | Eff. Output Price | Cache Hit Rate | |---|---|---|---|---| | Zai GLM 5 | SiliconFlow | $0.3100 | $2.5490 | 85.30% | | Zai GLM 5 | Baidu Qianfan | $0.3930 | $2.2390 | 54.70% | | Zai GLM 5 | GMICloud | $0.4200 | $1.9200 | 37.50% | | Zai GLM 5 | DeepInfra | $0.3410 | $2.0790 | 54.00% | | Zai GLM 5 | Z.ai | $0.4710 | $3.1990 | 66.10% | | Zai GLM 5 | Amazon Bedrock | $1.0000 | $3.1990 | 0.10% | | Zai GLM 5 | Friendli | $0.8700 | $3.2000 | 26.00% | | Zai GLM 5 | StreamLake | $0.4650 | $2.0790 | 35.60% | | Zai GLM 5 | NovitaAI | $0.7340 | $3.2000 | 33.20% | | Zai GLM 5 | AtlasCloud | $0.7830 | $3.1500 | 22.00% | | Zai GLM 5 | Parasail | $0.7720 | $3.2000 | 28.60% | | Zai GLM 5 | Together | $1.0000 | $3.2000 | 0.80% | | Zai GLM 5 | Chutes | $0.9410 | $2.5500 | 1.80% | | Zai GLM 5 | Phala | $1.2000 | $3.5000 | 0.10% | | Qwen Qwen3 VL 32B Instruct | Alibaba Cloud Int. | $0.1040 | $0.4140 | 0.00% | | Qwen Qwen36 35B A3B | Parasail | $0.1020 | $1.0000 | 47.60% | | Qwen Qwen36 35B A3B | Ambient | $0.1170 | $1.0000 | 33.20% | | Qwen Qwen36 35B A3B | io.net | $0.1500 | $1.0000 | 0.00% | | Qwen Qwen36 35B A3B | AkashML | $0.1700 | $1.2000 | 0.00% | | Qwen Qwen36 35B A3B | AtlasCloud | $0.1610 | $0.9650 | 0.00% | | Qwen Qwen36 35B A3B | Weights & Biases | $0.2500 | $1.2500 | 0.00% | | Qwen Qwen36 35B A3B | SiliconFlow | $0.2000 | $1.6000 | 0.00% | | OpenAI GPT-41 Nano | OpenAI | $0.0920 | $0.3990 | 11.00% | | OpenAI GPT-41 Nano | Azure 1 | $0.0710 | $0.3990 | 41.30% | | OpenAI GPT-41 Nano | Azure 2 | $0.1000 | $0.3980 | 0.00% | | xAI Grok 43 | xAI | $0.7490 | $2.5000 | 47.80% | | Anthropic Claude Sonnet 46 | Claude Platform on AWS | $0.9370 | $15.0000 | 79.30% | | Anthropic Claude Sonnet 46 | Anthropic | $0.6070 | $15.0000 | 89.90% | | Anthropic Claude Sonnet 46 | Google Vertex US East | $2.6200 | $15.0000 | 21.80% | | Anthropic Claude Sonnet 46 | Amazon Bedrock Global | $1.3370 | $15.0000 | 64.00% | | Anthropic Claude Sonnet 46 | Amazon Bedrock | $0.9400 | $15.0000 | 78.40% | | Anthropic Claude Sonnet 46 | Google Vertex Global | $1.8020 | $15.0000 | 52.70% | | Anthropic Claude Sonnet 46 | Google Vertex Europe | $2.4390 | $15.0000 | 28.70% | | Xiaomi MiMo-V25-Pro | Xiaomi | $0.3720 | $3.1670 | 94.80% | | Xiaomi MiMo-V25-Pro | DeepInfra | $0.9060 | $3.0000 | 11.70% | | Qwen Qwen3 Coder Next | Ionstream | $0.0860 | $0.7990 | 61.10% | | Qwen Qwen3 Coder Next | Parasail | $0.0860 | $0.7990 | 68.40% | | Qwen Qwen3 Coder Next | AtlasCloud | $0.1800 | $1.3490 | 0.00% | | Qwen Qwen3 Coder Next | NovitaAI | $0.2000 | $1.4990 | 0.00% | | Anthropic Claude Opus 46 | Claude Platform on AWS | $2.3320 | $25.0000 | 63.50% | | Anthropic Claude Opus 46 | Amazon Bedrock | $1.4750 | $25.0000 | 81.40% | | Anthropic Claude Opus 46 | Anthropic | $1.6520 | $25.0000 | 79.00% | | Anthropic Claude Opus 46 | Google Vertex | $1.9690 | $25.0000 | 71.10% | | Anthropic Claude Opus 46 | Google Vertex Europe | $3.0870 | $25.0000 | 44.30% | | Anthropic Claude Opus 46 | Azure | $6.2500 | $25.0000 | 0.00% | | Anthropic Claude Opus 47 | Claude Platform on AWS | $1.8440 | $25.0000 | 72.40% | | Anthropic Claude Opus 47 | Google Vertex | $2.4580 | $25.0000 | 65.30% | | Anthropic Claude Opus 47 | Amazon Bedrock US | $3.9520 | $25.0000 | 23.70% | | Anthropic Claude Opus 47 | Amazon Bedrock | $4.9530 | $25.0000 | 1.20% | | Anthropic Claude Opus 47 | Google Vertex Europe | $2.3000 | $25.0000 | 68.20% | | Anthropic Claude Opus 47 | Anthropic | $1.5920 | $25.0000 | 79.10% | | MiniMax MiniMax M27 | MiniMax | $0.1430 | $1.2000 | 65.60% | | MiniMax MiniMax M27 | Together | $0.2010 | $1.1990 | 41.20% | | MiniMax MiniMax M27 | Morph | $0.2790 | $1.1990 | 73.90% | | MiniMax MiniMax M27 | Fireworks | $0.2050 | $1.1990 | 39.20% | | MiniMax MiniMax M27 | MiniMax Highspeed | $0.2510 | $2.3990 | 64.70% | | MiniMax MiniMax M27 | SambaNova | $0.6000 | $2.3990 | 0.00% | | Qwen Qwen36 27B | DeepInfra | $0.3200 | $3.1990 | 0.00% | | Qwen Qwen36 27B | Alibaba Cloud Int. | $0.4500 | $2.6990 | 0.00% | | Qwen Qwen36 27B | Ambient | $0.2670 | $3.2000 | 32.80% | | Qwen Qwen36 27B | Weights & Biases | $0.6000 | $3.6000 | 0.00% | | Qwen Qwen36 27B | io.net | $0.3170 | $3.1990 | 0.00% | | Qwen Qwen36 27B | Morph | $0.4980 | $2.3990 | 28.50% | | Qwen Qwen36 27B | Chutes | $0.5000 | $2.0000 | 0.00% | | Qwen Qwen36 27B | Venice | $0.3250 | $3.2500 | 0.00% | | OpenAI gpt-oss-120b | Google Vertex | $0.0900 | $0.3590 | 4.10% | | OpenAI gpt-oss-120b | DeepInfra | $0.0390 | $0.1890 | 0.00% | | OpenAI gpt-oss-120b | Groq | $0.1200 | $0.5990 | 40.00% | | OpenAI gpt-oss-120b | Cerebras | $0.3500 | $0.7490 | 48.60% | | OpenAI gpt-oss-120b | DekaLLM | $0.0390 | $0.1770 | 0.00% | | OpenAI gpt-oss-120b | Baseten | $0.1000 | $0.4990 | 52.10% | | OpenAI gpt-oss-120b | NovitaAI | $0.0500 | $0.2490 | 2.50% | | OpenAI gpt-oss-120b | Ambient | $0.1070 | $0.6000 | 57.00% | | OpenAI gpt-oss-120b | DeepInfra Turbo | $0.1500 | $0.5990 | 0.00% | | OpenAI gpt-oss-120b | Parasail | $0.0890 | $0.7490 | 23.60% | | OpenAI gpt-oss-120b | SiliconFlow | $0.0500 | $0.4490 | 0.00% | | OpenAI gpt-oss-120b | Amazon Bedrock 1 | $0.1500 | $0.5990 | 0.00% | | OpenAI gpt-oss-120b | Nebius Token Factory | $0.1500 | $0.5990 | 0.00% | | OpenAI gpt-oss-120b | SambaNova Dedicated | $0.1200 | $0.8990 | 0.00% | | OpenAI gpt-oss-120b | SambaNova | $0.1400 | $0.9490 | 0.00% | | OpenAI gpt-oss-120b | Together | $0.1500 | $0.6000 | 0.00% | | OpenAI gpt-oss-120b | Phala | $0.1000 | $0.4890 | 14.00% | | OpenAI gpt-oss-120b | MARA | $0.1500 | $0.7490 | 0.00% | | OpenAI gpt-oss-120b | Weights & Biases | $0.1500 | $0.5990 | 12.70% | | OpenAI gpt-oss-120b | Amazon Bedrock 2 | $0.1500 | $0.5990 | 0.00% | | Zai GLM 47 Flash | DeepInfra | $0.0230 | $0.3990 | 73.40% | | Zai GLM 47 Flash | NovitaAI | $0.0610 | $0.3990 | 14.30% | | Zai GLM 47 Flash | Cloudflare | $0.0600 | $0.3970 | 0.00% | | Zai GLM 47 Flash | Phala | $0.1000 | $0.4290 | 0.00% | | Zai GLM 47 Flash | Z.ai | $0.0430 | $0.3990 | 44.20% | | Zai GLM 47 Flash | Venice | $0.1250 | $0.4960 | 14.40% | | OpenAI GPT-51 | OpenAI | $0.8220 | $10.0000 | 38.10% | | OpenAI GPT-51 | Azure 1 | $0.9550 | $10.0000 | 26.30% | | OpenAI GPT-54 Mini | OpenAI | $0.3370 | $4.5110 | 61.50% | | OpenAI GPT-54 Mini | Azure | $0.6840 | $4.4990 | 9.80% | | Meta Llama 31 8B Instruct | Groq | $0.0330 | $0.0750 | 67.80% | | Meta Llama 31 8B Instruct | DeepInfra | $0.0200 | $0.0480 | 0.00% | | Meta Llama 31 8B Instruct | NovitaAI | $0.0200 | $0.0480 | 0.00% | | Meta Llama 31 8B Instruct | Cerebras | $0.0990 | $0.0950 | 87.50% | | Meta Llama 31 8B Instruct | Cloudflare | $0.1520 | $0.2850 | 0.00% | | Qwen Qwen35-9B | DeepInfra | $0.0400 | $0.1480 | 0.00% | | Qwen Qwen35-9B | Together | $0.1000 | $0.1490 | 0.00% | | Qwen Qwen35-9B | SiliconFlow | $0.1000 | $0.1490 | 0.00% | | Qwen Qwen35-9B | Venice | $0.1000 | $0.1490 | 35.50% | | MoonshotAI Kimi K25 | DeepInfra | $0.1660 | $2.2490 | 74.80% | | MoonshotAI Kimi K25 | NovitaAI | $0.2320 | $2.8490 | 71.10% | | MoonshotAI Kimi K25 | ModelRun | $0.1610 | $1.9000 | 77.10% | | MoonshotAI Kimi K25 | Moonshot AI | $0.1360 | $3.0000 | 92.80% | | MoonshotAI Kimi K25 | Fireworks | $0.1960 | $3.0000 | 80.70% | | MoonshotAI Kimi K25 | Chutes | $0.3790 | $2.0000 | 27.70% | | MoonshotAI Kimi K25 | AtlasCloud | $0.3790 | $2.4990 | 38.10% | | MoonshotAI Kimi K25 | SiliconFlow | $0.2590 | $2.2500 | 50.10% | | MoonshotAI Kimi K25 | Cloudflare | $0.3390 | $3.0000 | 52.20% | | MoonshotAI Kimi K25 | Parasail | $0.3810 | $2.7990 | 54.80% | | MoonshotAI Kimi K25 | Phala | $0.6000 | $3.0000 | 2.20% | | MoonshotAI Kimi K25 | Venice | $0.5300 | $3.5000 | 8.90% | | Tencent Hy3 preview | SiliconFlow | $0.0350 | $0.2590 | 84.30% | | OpenAI GPT-53-Codex | OpenAI | $0.2820 | $14.0000 | 93.20% | | OpenAI GPT-53-Codex | Azure | $0.4470 | $14.0000 | 82.70% | | StepFun Step 35 Flash | StepFun | $0.0310 | $0.2990 | 86.10% | | StepFun Step 35 Flash | DeepInfra | $0.0900 | $0.2990 | 0.00% | | StepFun Step 35 Flash | SiliconFlow | $0.1000 | $0.3000 | 0.00% | | OpenAI GPT-54 Nano | OpenAI | $0.0870 | $1.2490 | 62.80% | | OpenAI GPT-54 Nano | Azure | $0.1530 | $1.2490 | 26.10% | | DeepSeek DeepSeek V3 0324 | NovitaAI | $0.1800 | $1.1180 | 66.30% | | DeepSeek DeepSeek V3 0324 | DeepInfra | $0.1600 | $0.7670 | 61.90% | | DeepSeek DeepSeek V3 0324 | ModelRun | $0.1850 | $0.7980 | 50.40% | | DeepSeek DeepSeek V3 0324 | SiliconFlow | $0.2500 | $1.0000 | 52.60% | | DeepSeek DeepSeek V3 0324 | AtlasCloud | $0.2140 | $0.8770 | 4.00% | | DeepSeek DeepSeek V3 0324 | GMICloud | $0.2890 | $1.1350 | 0.60% | | Qwen Qwen35 397B A17B | Morph | $0.4220 | $3.5000 | 64.70% | | Qwen Qwen35 397B A17B | Alibaba Cloud Int. | $0.3900 | $2.3400 | 0.00% | | Qwen Qwen35 397B A17B | Chutes | $0.2710 | $3.0000 | 79.60% | | Qwen Qwen35 397B A17B | DeepInfra | $0.4900 | $3.5990 | 0.00% | | Qwen Qwen35 397B A17B | Together | $0.6000 | $3.6000 | 48.40% | | Qwen Qwen35 397B A17B | NovitaAI | $0.6000 | $3.6000 | 5.10% | | Qwen Qwen35 397B A17B | Nebius Token Factory | $0.6000 | $3.5980 | 0.00% | | Qwen Qwen35 397B A17B | AtlasCloud | $0.5500 | $3.5000 | 0.00% | | Qwen Qwen35 397B A17B | Phala | $0.5500 | $3.5000 | 35.50% | | Qwen Qwen35 397B A17B | Parasail | $0.4090 | $3.6000 | 45.30% | | Qwen Qwen35 397B A17B | GMICloud | $0.6000 | $3.6000 | 0.00% | | Qwen Qwen35 397B A17B | Venice | $0.7500 | $4.5000 | 19.30% | | Mistral Mistral Small 32 24B | Mistral | $0.0860 | $0.2990 | 15.90% | | Mistral Mistral Small 32 24B | DeepInfra | $0.0750 | $0.1980 | 0.00% | | Mistral Mistral Small 32 24B | Parasail | $0.0730 | $0.5980 | 41.90% | | Mistral Mistral Small 32 24B | Venice | $0.0940 | $0.2490 | 0.00% | | Meta Llama 4 Maverick | Parasail | $0.3140 | $1.0000 | 19.30% | | Meta Llama 4 Maverick | DeepInfra | $0.1500 | $0.5970 | 0.00% | | Meta Llama 4 Maverick | NovitaAI | $0.2700 | $0.8460 | 0.00% | | Meta Llama 4 Maverick | SambaNova | $0.6300 | $1.7970 | 0.00% | | Mistral Mistral Medium 35 | Mistral | $1.5000 | $7.4990 | 21.70% | | Qwen Qwen36 Flash | Alibaba Cloud Int. | $0.1920 | $1.1400 | 0.30% | | OpenAI GPT-41 | OpenAI | $1.1000 | $8.0000 | 60.00% | | OpenAI GPT-41 | Azure 1 | $1.0780 | $8.0000 | 61.50% | | Mistral Mistral Nemo | DeepInfra | $0.0200 | $0.0370 | 0.00% | | Mistral Mistral Nemo | DekaLLM | $0.0200 | $0.0250 | 0.00% | | Mistral Mistral Nemo | Mistral | $0.0900 | $0.1430 | 44.40% | | Mistral Mistral Nemo | NovitaAI | $0.0390 | $0.1640 | 0.00% | | Google Gemma 4 31B | DeepInfra Turbo | $0.1200 | $0.3690 | 0.00% | | Google Gemma 4 31B | DeepInfra | $0.1300 | $0.3790 | 0.00% | | Google Gemma 4 31B | NovitaAI | $0.1400 | $0.3990 | 5.80% | | Google Gemma 4 31B | Chutes | $0.1090 | $0.3780 | 31.80% | | Google Gemma 4 31B | Together 2 | $0.3900 | $0.9690 | 0.00% | | Google Gemma 4 31B | SiliconFlow | $0.1300 | $0.3990 | 0.00% | | Google Gemma 4 31B | Together 1 | $0.2800 | $0.8580 | 0.00% | | Google Gemma 4 31B | Ambient | $0.1180 | $0.3990 | 18.40% | | Google Gemma 4 31B | Venice | $0.1750 | $0.5000 | 12.00% | | Google Gemma 4 31B | Parasail | $0.1370 | $0.3990 | 6.40% | | Google Gemini 25 Flash Lite | Google Vertex EU | $0.0800 | $0.3990 | 22.70% | | Google Gemini 25 Flash Lite | Google Vertex | $0.0940 | $0.3990 | 7.40% | | Google Gemini 25 Flash Lite | Google AI Studio | $0.0910 | $0.3990 | 12.00% | | Google Gemini 35 Flash | Google Vertex | $0.5520 | $9.0080 | 70.30% | | Google Gemini 35 Flash | Google AI Studio | $0.6340 | $7.9040 | 63.50% | | DeepSeek DeepSeek V31 | Weights & Biases | $0.5500 | $1.6370 | 31.80% | | DeepSeek DeepSeek V31 | NovitaAI | $0.2620 | $1.0000 | 5.70% | | DeepSeek DeepSeek V31 | DeepInfra | $0.1750 | $0.7880 | 43.20% | | DeepSeek DeepSeek V31 | SiliconFlow | $0.2700 | $1.0000 | 36.80% | | DeepSeek DeepSeek V31 | AtlasCloud | $0.2870 | $0.9470 | 7.30% | | DeepSeek DeepSeek V31 | Google Vertex | $0.6000 | $1.7000 | 19.30% | | DeepSeek DeepSeek V31 | SambaNova | $0.6500 | $1.4990 | 0.00% | | Google Gemma 4 26B A4B | Google Vertex | $0.1500 | $0.5970 | 19.10% | | Google Gemma 4 26B A4B | NovitaAI | $0.1300 | $0.4000 | 16.80% | | Google Gemma 4 26B A4B | NextBit | $0.1180 | $0.3980 | 63.40% | | Google Gemma 4 26B A4B | DeepInfra | $0.0700 | $0.3370 | 0.00% | | Google Gemma 4 26B A4B | Cloudflare | $0.1000 | $0.2970 | 0.00% | | Google Gemma 4 26B A4B | Parasail | $0.0960 | $0.3980 | 41.80% | | Google Gemma 4 26B A4B | DekaLLM | $0.0600 | $0.3270 | 0.00% | | Google Gemma 4 26B A4B | SiliconFlow | $0.1200 | $0.3970 | 0.00% | | Google Gemma 4 26B A4B | Venice | $0.1620 | $0.4990 | 74.60% | | Google Gemma 4 26B A4B | io.net | $0.1500 | $0.5000 | 0.00% | | OpenAI gpt-oss-20b | Weights & Biases | $0.0500 | $0.2000 | 69.00% | | OpenAI gpt-oss-20b | DeepInfra | $0.0300 | $0.1400 | 0.00% | | OpenAI gpt-oss-20b | Amazon Bedrock 2 | $0.0700 | $0.1500 | 0.00% | | OpenAI gpt-oss-20b | Amazon Bedrock 1 | $0.0700 | $0.1500 | 0.00% | | OpenAI gpt-oss-20b | NovitaAI | $0.0400 | $0.1500 | 0.00% | | OpenAI gpt-oss-20b | Google Vertex | $0.0690 | $0.2460 | 63.20% | | OpenAI gpt-oss-20b | Groq | $0.0660 | $0.2990 | 22.40% | | OpenAI gpt-oss-20b | Parasail | $0.0250 | $0.1990 | 71.40% | | OpenAI gpt-oss-20b | SiliconFlow | $0.0400 | $0.1790 | 0.00% | | OpenAI gpt-oss-20b | Fireworks | $0.0580 | $0.2990 | 33.70% | | OpenAI gpt-oss-20b | NextBit | $0.1000 | $0.4490 | 0.00% | | OpenAI gpt-oss-20b | Together | $0.0490 | $0.1990 | 0.00% | | OpenAI GPT-41 Mini | OpenAI | $0.2800 | $1.5940 | 40.00% | | OpenAI GPT-41 Mini | Azure 2 | $0.2520 | $1.5980 | 49.30% | | OpenAI GPT-41 Mini | Azure 1 | $0.2020 | $1.5930 | 66.00% | | DeepSeek DeepSeek V32 | NovitaAI | $0.2490 | $0.3990 | 15.20% | | DeepSeek DeepSeek V32 | Baidu Qianfan | $0.1050 | $0.3760 | 64.90% | | DeepSeek DeepSeek V32 | SiliconFlow | $0.1720 | $0.4190 | 69.80% | | DeepSeek DeepSeek V32 | DeepInfra | $0.1950 | $0.3790 | 49.60% | | DeepSeek DeepSeek V32 | AtlasCloud | $0.2410 | $0.3790 | 14.40% | | DeepSeek DeepSeek V32 | Friendli | $0.3990 | $1.5000 | 40.50% | | DeepSeek DeepSeek V32 | Alibaba Cloud Int. | $0.2730 | $1.1110 | 32.50% | | DeepSeek DeepSeek V32 | Parasail | $0.2540 | $0.4470 | 17.20% | | DeepSeek DeepSeek V32 | Google Vertex | $0.5600 | $1.6790 | 8.00% | | Google Gemini 20 Flash Lite | Google Vertex | $0.0750 | $0.2980 | 0.00% | | Google Gemini 20 Flash Lite | Google AI Studio | $0.0750 | $0.2950 | 0.00% | | Anthropic Claude Sonnet 4 | Amazon Bedrock 1 | $1.6300 | $15.0000 | 53.70% | | Anthropic Claude Sonnet 4 | Anthropic | $1.9410 | $15.0000 | 41.60% | | Anthropic Claude Sonnet 4 | Amazon Bedrock 2 | $2.1510 | $15.0000 | 35.10% | | Anthropic Claude Sonnet 4 | Google Vertex Global | $2.3880 | $15.0000 | 29.50% | | Anthropic Claude Sonnet 4 | Google Vertex Europe | $1.7360 | $15.0000 | 58.40% | | Anthropic Claude Haiku 45 | Amazon Bedrock Global | $0.3670 | $5.0000 | 72.40% | | Anthropic Claude Haiku 45 | Anthropic | $0.5420 | $5.0000 | 54.80% | | Anthropic Claude Haiku 45 | Google Vertex Europe | $0.9760 | $5.0000 | 2.80% | | Anthropic Claude Haiku 45 | Amazon Bedrock | $0.5780 | $5.0000 | 48.70% | | Anthropic Claude Haiku 45 | Google Vertex | $0.4890 | $5.0000 | 58.20% | | OpenAI GPT-54 | Azure | $1.8150 | $15.0000 | 30.40% | | OpenAI GPT-54 | OpenAI | $0.8980 | $15.1190 | 74.90% | | OpenAI GPT-5 | OpenAI | $0.4550 | $10.0000 | 70.60% | | OpenAI GPT-5 | Azure 1 | $1.2500 | $10.0000 | 0.00% | | Xiaomi MiMo-V2-Flash | Xiaomi | $0.0500 | $0.2980 | 55.00% | | Xiaomi MiMo-V2-Flash | NovitaAI | $0.0530 | $0.2990 | 58.70% | | Qwen Qwen35-35B-A3B | Parasail | $0.0910 | $1.0000 | 59.00% | | Qwen Qwen35-35B-A3B | Alibaba Cloud Int. | $0.1620 | $1.2990 | 0.00% | | Qwen Qwen35-35B-A3B | AkashML | $0.1600 | $1.2000 | 0.00% | | Qwen Qwen35-35B-A3B | Ambient | $0.1020 | $1.0000 | 41.60% | | Qwen Qwen35-35B-A3B | Venice | $0.1970 | $1.2500 | 73.90% | | Qwen Qwen35-35B-A3B | DeepInfra | $0.1400 | $1.0000 | 0.00% | | Qwen Qwen35-35B-A3B | DekaLLM | $0.1390 | $1.0000 | 0.00% | | Qwen Qwen35-35B-A3B | AtlasCloud | $0.2250 | $1.8000 | 0.00% | | Qwen Qwen35-35B-A3B | NextBit | $0.3000 | $1.8000 | 0.00% | | Qwen Qwen35-35B-A3B | SiliconFlow | $0.2400 | $1.8000 | 0.00% | | Google Gemini 25 Pro | Google Vertex Global | $1.0410 | $10.0650 | 30.10% | | Google Gemini 25 Pro | Google Vertex EU | $0.9680 | $10.0000 | 26.40% | | Google Gemini 25 Pro | Google AI Studio | $0.9680 | $10.0000 | 35.40% | | Google Gemini 25 Pro | Google Vertex US | $1.2200 | $10.0000 | 2.60% | | Zai GLM 51 | StreamLake | $0.3230 | $3.9590 | 91.30% | | Zai GLM 51 | Friendli | $0.6440 | $4.4000 | 66.30% | | Zai GLM 51 | Z.ai | $0.5660 | $4.3990 | 73.10% | | Zai GLM 51 | Chutes | $0.9960 | $4.0000 | 34.00% | | Zai GLM 51 | AtlasCloud | $0.5020 | $4.4000 | 78.50% | | Zai GLM 51 | DeepInfra | $0.4450 | $3.5000 | 71.60% | | Zai GLM 51 | SiliconFlow | $0.5070 | $4.3990 | 78.30% | | Zai GLM 51 | NovitaAI | $0.5120 | $4.4000 | 77.50% | | Zai GLM 51 | Baidu Qianfan | $0.5880 | $3.0790 | 49.10% | | Zai GLM 51 | Baseten | $1.3000 | $4.3000 | 45.60% | | Zai GLM 51 | Inceptron | $1.0810 | $4.4000 | 28.00% | | Zai GLM 51 | Together | $1.4000 | $4.4000 | 6.50% | | Zai GLM 51 | Parasail | $1.1110 | $4.4000 | 25.30% | | Zai GLM 51 | Ambient | $1.4000 | $4.4000 | 14.00% | | Zai GLM 51 | Phala | $1.2100 | $4.2000 | 0.70% | | Zai GLM 51 | io.net | $1.2900 | $4.4800 | 0.00% | | Zai GLM 51 | Fireworks | $1.2730 | $4.3990 | 11.10% | | Zai GLM 51 | Venice | $1.7470 | $5.5000 | 0.20% | | Zai GLM 51 | GMICloud | $0.0000 | $0.0000 | 0.00% | | Zai GLM 47 | Z.ai | $0.1170 | $2.1990 | 98.60% | | Zai GLM 47 | Google Vertex | $0.6000 | $2.2000 | 36.50% | | Zai GLM 47 | SiliconFlow | $0.2040 | $2.2000 | 72.30% | | Zai GLM 47 | DeepInfra | $0.2180 | $1.7500 | 56.90% | | Zai GLM 47 | Cerebras | $2.2500 | $2.7500 | 45.60% | | Zai GLM 47 | AtlasCloud | $0.3220 | $1.8500 | 49.60% | | Zai GLM 47 | Phala | $0.8500 | $3.2990 | 3.50% | | Zai GLM 47 | NovitaAI | $0.3700 | $2.0050 | 40.80% | | Zai GLM 47 | Parasail | $0.4270 | $2.1000 | 6.70% | | Zai GLM 47 | Venice | $0.5490 | $2.6490 | 0.30% | | OpenAI GPT-55 | OpenAI | $1.1170 | $30.6400 | 92.70% | | OpenAI GPT-55 | Azure | $2.3150 | $30.0820 | 62.00% | | Google Gemini 31 Flash Lite | Google Vertex | $0.2100 | $1.4980 | 19.20% | | Google Gemini 31 Flash Lite | Google AI Studio | $0.1860 | $1.4680 | 27.40% | | Qwen Qwen3 Coder 480B A35B | DeepInfra Turbo | $0.1210 | $1.0000 | 89.30% | | Qwen Qwen3 Coder 480B A35B | Together | $2.0000 | $2.0000 | 0.00% | | Qwen Qwen3 Coder 480B A35B | Google Vertex | $0.2200 | $1.7990 | 49.20% | | Qwen Qwen3 Coder 480B A35B | AtlasCloud | $0.7800 | $3.7980 | 0.00% | | Qwen Qwen3 Coder 480B A35B | Weights & Biases | $1.0000 | $1.4980 | 70.50% | | Qwen Qwen3 Coder 480B A35B | Alibaba OpenSource | $1.3730 | $6.8700 | 0.00% | | Qwen Qwen3 Coder 480B A35B | NovitaAI | $0.3800 | $1.5490 | 0.00% | | Qwen Qwen3 Coder 480B A35B | Venice | $0.3500 | $1.5000 | 0.00% | | Google Gemini 3 Flash Preview | Google Vertex | $0.3600 | $2.9970 | 32.00% | | Google Gemini 3 Flash Preview | Google AI Studio | $0.3480 | $2.9980 | 34.50% | | Qwen Qwen36 Plus | Alibaba Cloud Int. | $0.6690 | $1.9820 | 6.00% | | Mistral Mistral Small 4 | Mistral | $0.1280 | $0.5990 | 16.10% | | Mistral Mistral Small 4 | Venice | $0.1870 | $0.7490 | 11.20% | | MoonshotAI Kimi K26 | SiliconFlow | $0.2920 | $4.0000 | 83.90% | | MoonshotAI Kimi K26 | Moonshot AI | $0.3430 | $4.0000 | 76.80% | | MoonshotAI Kimi K26 | Cloudflare | $0.6870 | $3.5000 | 13.60% | | MoonshotAI Kimi K26 | Inceptron | $0.3360 | $3.5000 | 76.60% | | MoonshotAI Kimi K26 | Weights & Biases | $0.5520 | $4.0000 | 50.30% | | MoonshotAI Kimi K26 | NovitaAI | $0.2850 | $3.4000 | 80.50% | | MoonshotAI Kimi K26 | Chutes | $0.4410 | $3.5000 | 80.80% | | MoonshotAI Kimi K26 | Fireworks | $0.4160 | $4.0000 | 67.60% | | MoonshotAI Kimi K26 | Together | $0.4250 | $4.5000 | 77.40% | | MoonshotAI Kimi K26 | Parasail | $0.3120 | $3.5000 | 74.20% | | MoonshotAI Kimi K26 | io.net | $0.7300 | $3.4900 | 0.00% | | MoonshotAI Kimi K26 | AkashML | $0.9500 | $4.0000 | 0.00% | | MoonshotAI Kimi K26 | DeepInfra | $0.3220 | $3.5000 | 71.30% | | MoonshotAI Kimi K26 | AtlasCloud | $0.6370 | $4.0000 | 39.70% | | MoonshotAI Kimi K26 | Nebius Token Factory | $0.9500 | $4.0000 | 0.00% | | MoonshotAI Kimi K26 | StreamLake | $0.2760 | $3.8000 | 83.50% | | MoonshotAI Kimi K26 | Phala | $1.0900 | $4.6000 | 6.10% | | MoonshotAI Kimi K26 | Venice | $0.7270 | $4.6550 | 19.50% | | Qwen Qwen35-Flash | Alibaba Cloud Int. | $0.0650 | $0.2590 | 0.00% | | OpenAI GPT-5 Nano | OpenAI | $0.0380 | $0.3960 | 25.10% | | OpenAI GPT-5 Nano | Azure 1 | $0.0350 | $0.3990 | 36.50% | | DeepSeek DeepSeek V4 Flash | DeepSeek | $0.0220 | $0.2800 | 86.10% | | DeepSeek DeepSeek V4 Flash | SiliconFlow | $0.0890 | $0.2790 | 45.60% | | DeepSeek DeepSeek V4 Flash | Alibaba Cloud Int. | $0.0720 | $0.2790 | 61.00% | | DeepSeek DeepSeek V4 Flash | NovitaAI | $0.0720 | $0.2790 | 60.60% | | DeepSeek DeepSeek V4 Flash | Parasail | $0.1300 | $0.2790 | 14.40% | | DeepSeek DeepSeek V4 Flash | AtlasCloud | $0.0870 | $0.2790 | 47.40% | | DeepSeek DeepSeek V4 Flash | DeepInfra | $0.0790 | $0.1990 | 25.70% | | DeepSeek DeepSeek V4 Flash | GMICloud | $0.0690 | $0.2230 | 48.20% | | DeepSeek DeepSeek V4 Flash | Baidu Qianfan | $0.0810 | $0.2510 | 44.40% | | DeepSeek DeepSeek V4 Flash | AkashML | $0.1400 | $0.2790 | 0.00% | | DeepSeek DeepSeek V4 Flash | Venice | $0.1510 | $0.3490 | 13.60% | | OpenAI GPT-52 | OpenAI | $0.9490 | $14.0000 | 50.90% | | OpenAI GPT-52 | Azure | $1.0480 | $14.0000 | 44.60% | | Google Gemini 25 Flash | Google Vertex EU | $0.2350 | $2.4990 | 25.30% | | Google Gemini 25 Flash | Google Vertex Global | $0.2330 | $2.4990 | 27.50% | | Google Gemini 25 Flash | Google AI Studio | $0.1770 | $2.4990 | 47.80% | | Google Gemini 25 Flash | Google Vertex | $0.1780 | $2.5000 | 45.10% | | Google Gemini 31 Pro Preview | Google Vertex | $1.4470 | $12.0080 | 37.30% | | Google Gemini 31 Pro Preview | Google AI Studio | $1.5670 | $11.9550 | 24.20% | | OpenAI GPT-5 Mini | OpenAI | $0.1330 | $1.9960 | 50.70% | | OpenAI GPT-5 Mini | Azure 1 | $0.0900 | $2.0000 | 72.80% | | Xiaomi MiMo-V25 | Xiaomi | $0.1830 | $2.0570 | 74.20% | | Owl Alpha | Stealth | $0.0000 | $0.0000 | 62.60% | | DeepSeek DeepSeek V4 Pro | SiliconFlow | $0.7450 | $3.4790 | 59.30% | | DeepSeek DeepSeek V4 Pro | DeepSeek | $0.0560 | $0.8690 | 87.90% | | DeepSeek DeepSeek V4 Pro | NovitaAI | $0.4060 | $3.3790 | 82.00% | | DeepSeek DeepSeek V4 Pro | Alibaba Cloud Int. | $0.9230 | $3.3600 | 49.10% | | DeepSeek DeepSeek V4 Pro | GMICloud | $0.2460 | $2.7830 | 89.80% | | DeepSeek DeepSeek V4 Pro | DeepInfra | $0.7350 | $2.5990 | 47.10% | | DeepSeek DeepSeek V4 Pro | Baidu Qianfan | $0.8920 | $3.0410 | 45.10% | | DeepSeek DeepSeek V4 Pro | AtlasCloud | $1.1450 | $3.3800 | 34.50% | | DeepSeek DeepSeek V4 Pro | Parasail | $1.7220 | $3.4780 | 2.10% | | DeepSeek DeepSeek V4 Pro | Fireworks | $1.3190 | $3.4800 | 26.40% | | DeepSeek DeepSeek V4 Pro | Together | $1.4570 | $4.4000 | 33.80% | | DeepSeek DeepSeek V4 Pro | Venice | $0.7990 | $3.7950 | 66.50% | | OpenAI GPT-4o-mini | OpenAI | $0.1440 | $0.5970 | 8.00% | | OpenAI GPT-4o-mini | Azure 1 | $0.1260 | $0.5980 | 32.00% | | OpenAI GPT-4o-mini | Azure 2 | $0.1250 | $0.5980 | 33.50% | | Google Gemini 31 Flash Lite Preview | Google AI Studio | $0.1640 | $1.4990 | 38.30% | | Google Gemini 31 Flash Lite Preview | Google Vertex | $0.2300 | $1.4990 | 10.40% | | Zai GLM 45 Air | Z.ai | $0.0750 | $1.0960 | 73.50% | | Zai GLM 45 Air | NovitaAI | $0.0540 | $0.8460 | 72.30% | | Zai GLM 45 Air | SiliconFlow | $0.1400 | $0.8580 | 42.40% | | Anthropic Claude Sonnet 45 | Amazon Bedrock 1 | $2.2850 | $15.0010 | 30.40% | | Anthropic Claude Sonnet 45 | Google Vertex Global | $1.6100 | $15.0000 | 55.10% | | Anthropic Claude Sonnet 45 | Claude Platform on AWS | $0.9770 | $15.0000 | 77.80% | | Anthropic Claude Sonnet 45 | Anthropic | $3.0210 | $15.0000 | 5.40% | | Anthropic Claude Sonnet 45 | Google Vertex | $0.9840 | $15.0000 | 78.40% | | Anthropic Claude Sonnet 45 | Amazon Bedrock 2 | $1.7740 | $15.0000 | 48.70% | | MiniMax MiniMax M25 | DeepInfra | $0.0750 | $1.1480 | 62.50% | | MiniMax MiniMax M25 | MiniMax Highspeed | $0.1800 | $2.3990 | 77.80% | | MiniMax MiniMax M25 | Inceptron | $0.0740 | $0.8980 | 78.80% | | MiniMax MiniMax M25 | MiniMax | $0.0700 | $1.1990 | 85.30% | | MiniMax MiniMax M25 | AtlasCloud | $0.1110 | $1.1980 | 78.20% | | MiniMax MiniMax M25 | NovitaAI | $0.0710 | $1.1970 | 84.90% | | MiniMax MiniMax M25 | Chutes | $0.1080 | $1.1990 | 56.30% | | MiniMax MiniMax M25 | Baidu Qianfan | $0.1170 | $1.0790 | 62.90% | | MiniMax MiniMax M25 | Friendli | $0.1090 | $1.1990 | 79.50% | | MiniMax MiniMax M25 | AkashML | $0.1500 | $1.1490 | 0.00% | | MiniMax MiniMax M25 | Parasail | $0.1310 | $1.1990 | 62.60% | | MiniMax MiniMax M25 | MARA | $0.3000 | $1.2000 | 0.00% | | MiniMax MiniMax M25 | SiliconFlow | $0.1580 | $1.1990 | 52.60% | | MiniMax MiniMax M25 | Phala | $0.2000 | $1.3790 | 32.90% | | MiniMax MiniMax M25 | Weights & Biases | $0.2990 | $1.1960 | 24.20% | | MiniMax MiniMax M25 | StreamLake | $0.3050 | $1.1990 | 6.80% | | MiniMax MiniMax M25 | Venice | $0.1480 | $1.1880 | 63.80% | | Anthropic Claude Opus 45 | Google Vertex | $4.0740 | $25.0000 | 36.70% | | Anthropic Claude Opus 45 | Amazon Bedrock 1 | $0.9350 | $25.0000 | 91.90% | | Anthropic Claude Opus 45 | Amazon Bedrock 2 | $3.3230 | $25.0000 | 41.30% | | Anthropic Claude Opus 45 | Anthropic | $4.4120 | $25.0000 | 23.00% | | Anthropic Claude Opus 45 | Claude Platform on AWS | $1.7950 | $25.0000 | 74.00% | | Qwen Qwen3 235B A22B Instruct 2507 | Weights & Biases | $0.1000 | $0.0950 | 32.30% | | Qwen Qwen3 235B A22B Instruct 2507 | NovitaAI | $0.0900 | $0.5770 | 0.00% | | Qwen Qwen3 235B A22B Instruct 2507 | DeepInfra | $0.0710 | $0.0970 | 0.00% | | Qwen Qwen3 235B A22B Instruct 2507 | Parasail | $0.0780 | $0.5990 | 43.10% | | Qwen Qwen3 235B A22B Instruct 2507 | Alibaba Cloud Int. | $0.1490 | $0.5630 | 0.00% | | Qwen Qwen3 235B A22B Instruct 2507 | Google Vertex 2 | $0.2200 | $0.8790 | 28.40% | | Qwen Qwen3 235B A22B Instruct 2507 | Cerebras | $0.6000 | $1.1950 | 69.90% | | Qwen Qwen3 235B A22B Instruct 2507 | Google Vertex 1 | $0.2500 | $1.0000 | 0.00% | | Qwen Qwen3 235B A22B Instruct 2507 | Friendli | $0.2000 | $0.7980 | 0.00% | | Qwen Qwen3 235B A22B Instruct 2507 | Together | $0.2000 | $0.5980 | 0.00% | | Qwen Qwen3 235B A22B Instruct 2507 | AtlasCloud | $0.2000 | $0.8760 | 0.00% | | OpenAI GPT-51 Chat | OpenAI | $0.6280 | $10.0000 | 55.30% | | Google Gemini 20 Flash | Google Vertex | $0.0980 | $0.3990 | 5.40% | | Google Gemini 20 Flash | Google AI Studio | $0.1010 | $0.3990 | 0.60% | Opinion The "cheap" providers aren't always cheap. Besides that, one thing we did not discuss in this essay is the constant price increases from almost every provider. The numbers look worse when you consider the inference has been getting more expensive too. It makes increasingly more sense to move to a hybrid first with a smaller local model and then fully local setup eventually for coding agents.