How LLM Tokens Work (And Why They Explain Your AI Bill)

An engineer explains that large language models like Claude do not read words but tokens—chunks of text mapped to integers—and that this token-based design is the root cause of AI billing surprises. The post details how tokenization works, why pricing is per token for both input and output, and how costs accumulate from token count multiplied by call count.

Your LLM never reads your words — it reads tokens. And almost every surprise on your AI bill traces back to that one fact. Here's the breakdown 👇 Here's the thing almost nobody internalizes about large language models: Claude never reads your words. It reads tokens — numbers. Your prompt is chopped into pieces, each piece is mapped to an integer, and the model only ever sees those integers. Every limit you hit, every bill you pay, and half the weird behavior you've seen traces back to this one fact. This article explains what a token actually is, why the model works in tokens instead of words, and how that single design choice explains your AI bill. The one-sentence version:text is split into tokens chunks roughly ¾ of a word on average , each token maps to a number, and you pay per token — inandout — so understanding tokens is understanding cost. A token is a chunk of text — often a word, but frequently a piece of a word, a space, or a punctuation mark. The tokenizer is a fixed dictionary that maps text chunks to integer IDs. Rough intuition: the , code , error are usually tokenization → token + ization .So "How tokens work" isn't 3 words to the model — it's a sequence of integer IDs like 4438, 11460, 990 . The model does math on those numbers. The English you typed was never seen. Two extremes, both bad: Tokens are the engineered middle: a fixed vocabulary tens of thousands of entries of common chunks that can assemble any text — including words the model has never encountered — by gluing pieces together. It's the compression that makes the whole thing tractable. Every API provider, including Anthropic, prices per token — and counts both directions: This is why costs surprise people: Your bill ≈ input tokens × input price + output tokens × output price └── prompt + history + docs + tools └── the model's reply Say input is priced at $3 per million tokens and output at $15 per million illustrative — check current rates . You send a 1,000-token prompt and get a 500-token answer: Tiny — until you multiply by thousands of calls, or let conversation history balloon each call's input to 20,000 tokens. That's where bills come from: not one expensive call, but token count × call count. How many tokens is a typical page of text? Roughly 500–800 tokens per page of prose, but it varies with formatting and vocabulary. Why do code and JSON sometimes cost more tokens than they look? Symbols, indentation, and braces each tokenize separately, so structured text can be token-dense relative to its character count. Does the system prompt count? Yes. The system prompt, tool definitions, and any retrieved context are all input tokens you pay for on every call. Is there a way to avoid resending the same big context every time? Yes — prompt caching lets you reuse a stable prefix at a fraction of the cost. I'm doing a whole series taking Claude apart piece by piece — video + written version of each — at The Stack Underflow. The full written companion to this one, plus the rest of the series, lives at thestackunderflow.com/tutorials.