Stop guessing your AI API bill: a quick guide to token cost math AI API costs are billed per token (roughly 4 characters of English), with input and output tokens charged separately and output typically costing more—for example, GPT-4o charges $2.50 per million input tokens and $10.00 per million output tokens. It provides a simple formula to estimate costs per request and monthly bills, emphasizing that setting a sensible `max_tokens` limit is a key optimization. The author recommends using free online calculators (like Vortenza's) to estimate costs during the design phase, treating cost as a design constraint to avoid surprise invoices. You can ship an LLM feature in an afternoon. Figuring out what it costs to run usually happens later, when the invoice shows up and someone asks why. A few minutes of token math up front avoids most of that. Here is how the pricing works and how to estimate it. Providers bill per token, not per word or per request. A token is about 4 characters of English, so "Hello world" is roughly 3 tokens and 750 words lands near 1,000 tokens. Input and output are billed separately, and output is almost always the pricier side. GPT-4o is $2.50 per million input tokens and $10.00 per million output tokens. That 4x gap is the part people underestimate once responses get long. Per request, the cost is: cost = input tokens / 1M input price + output tokens / 1M output price Multiply by monthly volume and you have the bill. Take a support bot: 800 input tokens system prompt plus the user message and 400 output tokens per reply, 50,000 requests a month, on GPT-4o. Run the same workload on GPT-4.1 Mini and the number drops by roughly 10x. That one comparison is often what decides the model. Three things bite people repeatedly: max tokens sensibly is the cheapest optimization there is.I got tired of redoing this per model, so I've been using Vortenza's free AI calculators. The OpenAI API Cost Calculator lets you pick a model and drop in your tokens and monthly volume. There's a Claude API Cost Calculator for Anthropic models, and an AI Token Counter for when you want the actual token count of an input instead of a guess. No signup, runs in the browser. The calculator isn't really the point, though. The point is doing the estimate while you're still designing the feature. Cost is a design constraint, same as latency. Treat it like one and the invoice stops being a surprise.