{"slug": "how-llm-tokens-work-and-why-they-explain-your-ai-bill", "title": "How LLM Tokens Work (And Why They Explain Your AI Bill)", "summary": "An engineer explains that large language models like Claude do not read words but tokens—chunks of text mapped to integers—and that this token-based design is the root cause of AI billing surprises. The post details how tokenization works, why pricing is per token for both input and output, and how costs accumulate from token count multiplied by call count.", "body_md": "Your LLM never reads your words — it reads tokens. And almost every surprise on your AI bill traces back to that one fact. Here's the breakdown 👇\n\nHere's the thing almost nobody internalizes about large language models: **Claude never reads your words.** It reads *tokens* — numbers. Your prompt is chopped into pieces, each piece is mapped to an integer, and the model only ever sees those integers. Every limit you hit, every bill you pay, and half the weird behavior you've seen traces back to this one fact.\n\nThis article explains what a token actually is, why the model works in tokens instead of words, and how that single design choice explains your AI bill.\n\nThe one-sentence version:text is split into tokens (chunks roughly ¾ of a word on average), each token maps to a number, and you pay per token — inandout — so understanding tokens is understanding cost.\n\nA token is a chunk of text — often a word, but frequently a *piece* of a word, a space, or a punctuation mark. The tokenizer is a fixed dictionary that maps text chunks to integer IDs.\n\nRough intuition:\n\n`the`\n\n, `code`\n\n, `error`\n\n) are usually `tokenization`\n\n→ `token`\n\n+ `ization`\n\n).So \"How tokens work\" isn't 3 words to the model — it's a sequence of integer IDs like `[4438, 11460, 990]`\n\n. The model does math on those numbers. The English you typed was never seen.\n\nTwo extremes, both bad:\n\nTokens are the engineered middle: a fixed vocabulary (tens of thousands of entries) of common chunks that can assemble *any* text — including words the model has never encountered — by gluing pieces together. It's the compression that makes the whole thing tractable.\n\nEvery API provider, including Anthropic, **prices per token** — and counts both directions:\n\nThis is why costs surprise people:\n\n```\nYour bill ≈ (input tokens × input price) + (output tokens × output price)\n            └── prompt + history + docs + tools      └── the model's reply\n```\n\nSay input is priced at $3 per million tokens and output at $15 per million (illustrative — check current rates). You send a 1,000-token prompt and get a 500-token answer:\n\nTiny — until you multiply by thousands of calls, or let conversation history balloon each call's input to 20,000 tokens. *That's* where bills come from: not one expensive call, but token count × call count.\n\n**How many tokens is a typical page of text?**\n\nRoughly 500–800 tokens per page of prose, but it varies with formatting and vocabulary.\n\n**Why do code and JSON sometimes cost more tokens than they look?**\n\nSymbols, indentation, and braces each tokenize separately, so structured text can be token-dense relative to its character count.\n\n**Does the system prompt count?**\n\nYes. The system prompt, tool definitions, and any retrieved context are all input tokens you pay for on every call.\n\n**Is there a way to avoid resending the same big context every time?**\n\nYes — prompt caching lets you reuse a stable prefix at a fraction of the cost.\n\n*I'm doing a whole series taking Claude apart piece by piece — video + written version of each — at The Stack Underflow. The full written companion to this one, plus the rest of the series, lives at thestackunderflow.com/tutorials.*", "url": "https://wpnews.pro/news/how-llm-tokens-work-and-why-they-explain-your-ai-bill", "canonical_source": "https://dev.to/thestackunderflow_3b3b3b6/how-llm-tokens-work-and-why-they-explain-your-ai-bill-46b", "published_at": "2026-06-23 22:47:10+00:00", "updated_at": "2026-06-23 23:48:43.031074+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "developer-tools"], "entities": ["Claude", "Anthropic", "The Stack Underflow"], "alternates": {"html": "https://wpnews.pro/news/how-llm-tokens-work-and-why-they-explain-your-ai-bill", "markdown": "https://wpnews.pro/news/how-llm-tokens-work-and-why-they-explain-your-ai-bill.md", "text": "https://wpnews.pro/news/how-llm-tokens-work-and-why-they-explain-your-ai-bill.txt", "jsonld": "https://wpnews.pro/news/how-llm-tokens-work-and-why-they-explain-your-ai-bill.jsonld"}}