The Context Tax: Why Step 12 Costs 42x Step 1 (Measure It in 40 Lines) A developer created a 40-line Python script called context_tax.py to measure the 'context tax' in AI agent sessions, where each step re-sends the entire conversation history as input, causing total token cost to grow quadratically. The script found that in a synthetic debugging session, the last step billed 42.8 times the input of the first step, highlighting that falling token prices do not fix the compounding cost structure. In short: the context tax is what you pay when every agent step re-sends the whole session transcript as input again, so step N re-bills turns 1..N and total cost grows with n n+1 /2. Cheaper tokens lower the unit, not the shape. context tax.py meters the re-bill multiplier offline; one debugging session measured 42.8x . AI disclosure:I drafted this with an AI writing assistant. The tool, the fixtures, and every number below come from a real local run of the script in this post on tiktoken o200k base. I reviewed and edited it before publishing. Token prices have been sliding all year. Your agent bill probably hasn't. I kept running into the same confusion in my own FinOps notes: per-token rates drop, and the monthly number goes the other way. The usual answers "you're using a bigger model," "you have more users" didn't explain a single session getting more expensive as it ran . So I wrote a 40-line meter to look at the one thing nobody charts: the session transcript itself. On a synthetic-but-realistic debugging session, the last step billed 42.8x the input of the first step. Same model. Same task. No new users. That gap has a boring cause and an annoying consequence. Here's both, plus the script. TL;DR. Every step of an agent loop re-sends the whole conversation so far history plus tool outputs as input . So step N pays for turns 1..N again, and total input grows roughly with n n+1 /2. Cheaper tokens don't fix the shape; they just lower the unit on a number that's still climbing. context tax.py below, keyless, offline meters three things from a session JSON: the re-bill curve, the re-bill multiplier, and a dead-weight estimate. On my bloated fixture it reported a 42.8x multiplier and 19.3% dead weight, and exited 1 as a CI gate. Here's the part that trips people up. An LLM call is stateless. The model doesn't "remember" turn 3 when you make turn 12. Your framework re-sends turns 1 through 11 as input so the model can see them. Every. Single. Step. So the cost of one step isn't the cost of that step's new text. It's the cost of the entire history up to that point. Step 1 bills a short user message. Step 12 bills the user message plus a file dump plus a wide grep plus a stack trace plus every assistant reply in between. The new tokens at step 12 might be tiny. The billed input is not. Logan Waxell put the shape plainly in The Compounding Math Your Architecture Is Hiding : "total cost grows roughly with n n+1 /2," and a turn-10 context can sit at 80,000–200,000 tokens. That post nails the problem and then points you at a proprietary runtime. I wanted the opposite: a tiny script I can run on my own transcript and check into CI. So that's what this is. And it's why "tokens got cheaper" is the wrong consolation. Edwin Lisowski's Token Prices Are Falling. So Why Is Your AI Bill Going Up? lists the drivers: full context re-sent each step, tool schemas eating 30–60% of the window before any user content, retries and sub-agents running around the clock. That schema overhead is a sibling tax worth metering on its own — I did exactly that for MCP servers in There's a second reason to meter instead of guess. Agents are bad at predicting their own spend. The arXiv paper How Do AI Agents Spend Your Money? Bai, Huang, Wang, Sun, Mihalcea, Brynjolfsson, Pentland, Pei measured agentic coding tasks and found three things worth pinning to the wall: agentic runs burn roughly So the takeaway writes itself: meter the transcript, don't trust the estimate. If the model can't call its own number, your gut can't either. context tax.py reads one JSON file: a session transcript as a list of turns role + content , tool results included . It tokenizes with tiktoken 's o200k base and reports four things. The exit code is the point. 0 if the multiplier is under threshold a disciplined session , 1 if it's over the architecture is compounding, so fail the build , 2 for usage. Drop it in CI and a session that balloons becomes a red check, not a surprise line item. bash /usr/bin/env python3 """context tax.py - meter the re-bill tax on a single agent session's transcript.""" import json, re, sys THRESHOLD = 12.0 re-bill multiplier above this = compounding architecture DEAD OVERLAP = 0.15 a turn is dead weight if <15% of its terms resurface later STOP = set "the a an of to in is it on for and or but with as at by from this that be are was you your i we they it's".split try: import tiktoken enc = tiktoken.get encoding "o200k base" def count t : return len enc.encode t TOKENIZER = "tiktoken o200k base exact " except Exception: honest fallback, ~+-15% vs real BPE def count t : return max 1, round len t / 4 TOKENIZER = "len/4 heuristic tiktoken not installed; ~+-15% " def words t : return {w for w in re.findall r" a-z0-9 {4,}", t.lower if w not in STOP} def main argv : if len argv < 2: print "usage: context tax.py