I Compared the Real Cost of Claude Code, OpenRouter, and Image APIs

A developer built a budget model comparing the real cost of Claude Code, OpenRouter, and image APIs, revealing that production costs often exceed listed prices due to retries, planning buffers, and failed jobs. For a sample app with 100 users, the base cost of $81 rose to $95.94 after adding a 3% retry rate and 15% planning buffer. The analysis highlights that coding-agent tasks involve multiple model calls and context, while image generation costs per accepted image can be more than double the per-generation price due to multiple attempts.

An API request that looks cheap on a pricing page can become much more expensive inside a real product. The pricing page normally gives you the unit rate: That is useful, but it is not yet a production budget. A production workflow can also include repeated context, tool results, cache writes, retries, duplicate submissions, failed media jobs, and outputs that users reject and regenerate. I wanted to see how much those factors change the estimate, so I built the same budget model for three different workloads: This article explains the method. I also published an editable calculator and downloadable CSV dataset at the end. Consider a small application with the following usage: 100 monthly active users 20 active days per month 3 requests per user per active day 2,000 average input tokens per request 500 average output tokens per request $3 per 1M input tokens $15 per 1M output tokens 3% retry rate 15% planning buffer These are editable planning assumptions, not universal usage averages or an official provider quote. The first step is to estimate request volume: monthly requests = 100 users × 20 active days × 3 requests monthly requests = 6,000 The input-token cost is: 6,000 requests × 2,000 input tokens ÷ 1,000,000 × $3 input cost = $36 The output-token cost is: 6,000 requests × 500 output tokens ÷ 1,000,000 × $15 output cost = $45 That produces a listed base cost of: php $36 + $45 = $81 Adding the editable 3% retry assumption: $81 × 1.03 = $83.43 Adding a further 15% planning buffer: $83.43 × 1.15 = $95.94 The difference is relatively small in this example, but the same multipliers become more significant at higher volume. The important point is that $95.94 is not an official price quote. It is a derived planning result based on: Those categories should remain separate. A coding-agent task is not equivalent to one chat message. One completed task may involve: The user may only see a short final response such as: Fixed the validation bug and added a regression test. However, the agent may have processed a much larger amount of context before producing that response. For coding-agent workflows, I find this cost unit more useful: cost per completed task rather than: cost per user message A simplified task formula is: task cost = model calls per task × estimated cost per model call × retry multiplier Each model call may include: input tokens = instructions + conversation history + source files + tool definitions + previous tool results That is why two tasks with similarly short final answers can have very different costs. A one-file configuration fix and a repository-wide migration should not share the same expected token budget. Tools do not need a separate flat fee to increase the total cost. The model may need to: Large command outputs and large file reads can become part of later input context. For a deeper breakdown, see the Claude Code Token Cost Guide https://aicostplanner.com/claude-code-token-cost/ . Image generation needs a different cost model. Depending on the provider and model, the billing unit may be: The cost of one accepted image is also different from the cost of one submitted API job. Suppose a product needs 1,000 accepted images per month. If users accept the first result every time, the application submits approximately 1,000 generation jobs. But if one accepted image requires an average of 2.4 attempts: 1,000 accepted images × 2.4 attempts = 2,400 submitted jobs At an illustrative price of $0.04 per submitted generation: 2,400 × $0.04 = $96 The effective cost per accepted image becomes: $96 ÷ 1,000 = $0.096 That is more than twice the listed one-generation price. This does not mean every provider charges for every failed request. Failed-job billing varies by provider, failure type, and processing stage. The useful distinction is between: Those numbers should be reconciled with request IDs and the provider billing dashboard. The Image Generation API Cost Guide https://aicostplanner.com/image-generation-api-cost/ explains the different billing units in more detail. A marketplace or gateway introduces another layer of cost management. With OpenRouter, it is useful to distinguish between: These concepts are related, but they are not interchangeable. For example, a key may have a spending limit even when the account still has credits. A request may also fail because of a permission or policy restriction rather than insufficient balance. OpenRouter currently reserves the right to expire unused credits one year after purchase. An HTTP 402 normally indicates insufficient credits, while 403 generally points to a permission, guardrail, or moderation restriction. I documented the operational checks separately in OpenRouter Credits https://aicostplanner.com/openrouter-credits/ . For a text API request, I would record at least: request id provider model input tokens output tokens estimated cost status For a more useful production record, I would add: cached input tokens retry count latency provider reported cost created at For media jobs, I would also record: job id duration or resolution submitted at completed at failure stage accepted by user Without request-level records, it is difficult to answer basic billing questions: I now separate cost planning into four layers. This comes from provider pricing documentation. Examples include: price per 1M input tokens price per 1M output tokens price per generated image price per generated video second These are specific to the product: active users requests per user average input tokens average output tokens tool calls per task attempts per accepted image generated video duration These may include: retries duplicate submissions cache writes cache reads rejected outputs failed jobs These should be editable assumptions rather than universal facts. A budget buffer can help when usage is uncertain, but it should not hide the underlying estimate. The calculator should show both: base API cost planned operational budget A cost calculator has important limitations. It cannot determine: Price is also not a measure of output quality. A calculator is a planning tool, not a replacement for provider usage records or billing dashboards. I put the complete model into the 2026 AI API Cost Benchmark https://aicostplanner.com/ai-api-cost-benchmark/ . It includes: The current pricing snapshot was reviewed on June 17, 2026. The calculator is free to use and does not require an API key. Pricing changes frequently, so verify current provider documentation before launching a production workload and compare the estimate with a small real-world test. What caused the largest difference between your original estimate and your actual AI bill: output tokens, repeated agent context, retries, or media regeneration?