The Cost Curve of Unchecked LLM Context

wpnews.pro

LLM use can get expensive if the per‑call token use is unchecked.

There is a contradiction in LLM‑assisted engineering: you must provide enough context for quality, yet the workflows that supply this context also tend to inflate token usage beyond what the model can meaningfully use.

Below is a concrete, realistic example for one developer, showing exactly how costs can balloon when normal workflow patterns accumulate more context than the model can effectively handle.

This article is about unchecked token growth in engineering workflows and the cost multipliers that follow.

Unchecked growth could cost you thousands per engineer a month #

Imagine an engineer building an agent to analyse logs. The system has two major subsystems: log processor with a UI, and access to a remote runtime LLM system that works on the log contents surfaced by the log processor, providing insight back to the UI about what is in the log and what it means.

During development, the engineer is iterating, debugging, and refining the solution. To do this, the engineer is using AI support within their engineering workflow.

To support the engineer's development of the solution, the LLM being used in the engineering workflow is:

analysing log fragments
suggesting new extraction queries
reflecting on errors
retrying after failures
running multistep loops

Initial token cost #

Initially, the token cost for the above might be:

- System prompt: 300 tokens
- User input: 200 tokens
- Expected output: 300 tokens

This is a total of about 800 tokens.

Model Tier	Tokens Used	Price per Million Tokens	Total Cost Per Call
Cheap	800	$0.20	$0.00016
Medium	800	$3.5	$0.0028
High‑end	800	$10	$0.008

All of this is cheap, predictable, and stable.

Unchecked token growth during engineering #

Unchecked token usage can lead to:

system prompt growth
chat history accumulation
a RAG component dumping too much context
retries repeating the entire prompt
additional tools added to the engineering environment, generating more tokens
output schemas get verbose
debugging prints get included in context
logs get pasted directly into prompts

Costs balloon #

Assume that an engineer makes 300 LLM calls per 8-hour day to develop the above log agent. Rounding up, that is 38 per hour.

Over time, unchecked token usage can increase:

System prompts to 2,500 tokens
Full chat history kept for LLM context and continuity at 4,000 tokens
RAG over‑retrieval dumps six log files totaling 8,000 tokens
Output schema at 1,000 tokens
Safety boilerplate in the prompt at 500 tokens
Engineering LLM output as before at 300 tokens

This is a total of 16,300 tokens per LLM call.

For the cheap, medium, and high-end LLM engineering models, this is the cost:

Tier	Cost per 16,300‑token call	Daily cost (8h)	Monthly cost (20 days)
Cheap	$0.00326	$0.978	$19.56
Medium	$0.05705	$17.115	$342.3
High‑end	$0.163	$48.9	$978.0

The cost per call is low. The monthly cost soon adds up.

The cost when token usage is checked:

| Tier | Cost per 800‑token call | Daily cost (8h) | Monthly cost (20 days) |
|---|---|---|---|

| Cheap | $0.00016 | $0.048 | $0.96 | | Medium | $0.0028 | $0.84 | $16.8 | | High‑end | $0.008 | $2.40 | $48.0 |

When using more than twenty times the tokens per call, the high-end model cost balloons from $48 per month to $978.

As a chart, the trend becomes quite clear.

Are 300 LLM calls per day realistic? #

Yes. It depends on what needs to be done.

Code generation

When using an interactive coding assistant, every on-screen autocompletion of code or refactor of the implementation to improve the solution, or explanation of what a part of the implementation does, is a call to the LLM. An hour of active coding could generate 40 to 80 LLM calls.

Agentic assistance

Agentic workflows typcally operate with a plan, act, observe, revise pipeline. If the engineer instructed an agent to "add logging so that a division by zero error is logged and all tests pass", the LLM would generate a plan to read the appropriate code file with the division code in it, inspect the tests, modify the function to add logging in the right place, run the tests and if the tests fail, revise the change to the code. This is the plan. That plan is then put into place, observations are made of the outcome and any revisions are identified. This pipeline may involve 3 to 10 calls.

Testing

When running a test suite, if the tests reach out to an LLM, it is easy to reach 100 calls per test suite run.

Teams use LLMs within test suites to:

evaluate natural‑language behaviour
judge style, tone, or reasoning
compare outputs that are not byte‑identical
classify correctness when rules are fuzzy
score answers in educational or assessment systems
validate agent behaviour
check explanations or rationales

Documentation

When the engineer needs to document or review code, each comment, rewrite or suggestion is an LLM call.

In these environments, 300 calls per day is realistic. Depending on your organisation, it may be on the low side.

Why so many calls? #

LLMs are: statelessness, probabilistic, have limited context, no persistent memory, and no internal world‑model so they do not understand your code nor the system that will execute it.

This design means that complex tasks — such as adding logging to the right place in code — must be decomposed into many short, repeated calls.

Multiple calls are a necessity to achieve quality collaboration from an LLM.

Not only must there be multiple calls but a minimum amount of information is required per call. Without this contextual information due to a lack of a world-model, the quality of results would be significantly lower: no one wants an LLM model hallucinating code due to its probabilitic implementation.

However, passing too much information decreases the quality of LLM output.

This is because of how how LLMs are built. An LLM will compress, blur, and mis‑prioritise information when the prompt becomes too large or too dense.

A prompt is too dense when it contains more information than the model can meaningfully separate, prioritise, or reason over. Prompt density is not only about prompt length.

If the prompt contains too many facts, instructions, examples, or files, the LLM cannot assign stable importance weights, so it treats everything as equally relevant. The result will be vague, averaged, or generic output.

If an LLM is presented with multiple coding styles, conventions, or code patterns and they appear close together, the model cannot decide which pattern to follow, so it blends them. The result is inconsistent naming, mixed styles, or contradictory behaviour.

For prompts, less is more.

With an LLM, too little context leads to poor results. With enough context, results will be good. With too much context, you will have poor results and high cost.

Conclusion #

Even though $10 per million tokens does not sound like much it is the rate of usage that is key. And a high rate of usage may come from the type of work you engineers are performing.

A minimum usage rate is required to get good results from an LLM as context needs to be restated. However, this and engineering workflow pressures can lead to an increase in the number of tokens used per call.

Unchecked token growth is both expensive and counter-productive.

Organizations must size token usage approriately, balancing cost and quality.

A system without a world‑model forces engineers to restate context, but restating too much context degrades quality and inflates cost. The job of engineering leadership is to enforce context discipline, and workflow design to keep teams on the efficient part of the curve.

The goal is not to minimise tokens but to control context so that the model receives only what it can meaningfully use.