Estimate your monthly API costs across every major provider. Compare models side by side with batch and caching discounts.
| Provider | Model | In $/1M | Out $/1M | Daily | Monthly |
|---| Pricing updated May 25, 2026. Prices are sourced from each provider's official documentation when available; otherwise from the public OpenRouter model list. Always verify against your provider's pricing page before committing to spend. Batch = 50% off standard rates (most providers). Cached = 90% off input token cost (prompt caching). Actual discounts vary by provider. Meta/Llama pricing is representative of major cloud providers (AWS Bedrock, Together AI).
How to Estimate Your AI API Budget #
AI API costs depend on three main factors: the number of requests you make, how much text goes into each request (input tokens), and how much the model generates (output tokens). This calculator multiplies your usage pattern against every major model's pricing to show you the full cost picture.
Understanding the Inputs
Requests per day— How many API calls your application makes daily. A chatbot might handle 500-5,000 conversations/day. A batch processing pipeline might run 10,000-100,000.Average input tokens— The typical size of your prompt. A simple question is ~50 tokens. A prompt with context/instructions is 200-500. RAG with document chunks can be 2,000-8,000+.Average output tokens— How much the model generates per request. A short answer is ~50 tokens. A paragraph is 100-200. A full article or code generation is 500-2,000+.
Saving Money with Batch Processing
Most providers offer batch processing at 50% off standard rates. Instead of real-time responses, you submit requests in bulk and get results within 24 hours. Ideal for: data labeling, content generation, document processing, and any workflow where latency doesn't matter.
Saving Money with Prompt Caching
Prompt caching (available on Anthropic, OpenAI, and Google) stores your system prompt and reuses it across requests. Cached input tokens cost ~90% less than uncached. This is most effective when you have a large, static system prompt (instructions, examples, documents) that stays the same across many requests.
Input vs. Output Token Pricing
Output tokens are typically 3-6x more expensive than input tokens. This is because generating text requires more compute than reading it. When optimizing costs, reducing output length (shorter responses, structured output formats) often has more impact than reducing input length.
Hidden Costs to Watch For
Reasoning tokens— OpenAI's o-series and some other reasoning models bill internal "thinking" tokens at output rates, which can multiply costs 3-10x.Long context surcharges— Google Gemini charges 2x for prompts over 200K tokens.** Tool use / function calls**— Tool definitions count as input tokens and add up across requests.** Retries and errors**— Failed requests with partial responses may still be billed.