How LLM Tokens Work (And Why They Explain Your AI Bill)

wpnews.pro

cd /news/large-language-models/how-llm-tokens-work-and-why-they-exp… · home › topics › large-language-models › article

[ARTICLE · art-36429] src=dev.to ↗ pub=2026-06-23T22:47Z topic=large-language-models verified=true sentiment=· neutral

How LLM Tokens Work (And Why They Explain Your AI Bill)

An engineer explains that large language models like Claude do not read words but tokens—chunks of text mapped to integers—and that this token-based design is the root cause of AI billing surprises. The post details how tokenization works, why pricing is per token for both input and output, and how costs accumulate from token count multiplied by call count.

read3 min views7 publishedJun 23, 2026

Your LLM never reads your words — it reads tokens. And almost every surprise on your AI bill traces back to that one fact. Here's the breakdown 👇

Here's the thing almost nobody internalizes about large language models: Claude never reads your words. It reads tokens — numbers. Your prompt is chopped into pieces, each piece is mapped to an integer, and the model only ever sees those integers. Every limit you hit, every bill you pay, and half the weird behavior you've seen traces back to this one fact.

This article explains what a token actually is, why the model works in tokens instead of words, and how that single design choice explains your AI bill.

The one-sentence version:text is split into tokens (chunks roughly ¾ of a word on average), each token maps to a number, and you pay per token — inandout — so understanding tokens is understanding cost.

A token is a chunk of text — often a word, but frequently a piece of a word, a space, or a punctuation mark. The tokenizer is a fixed dictionary that maps text chunks to integer IDs.

Rough intuition:

the

, code

, error

) are usually tokenization

→ token

ization

).So "How tokens work" isn't 3 words to the model — it's a sequence of integer IDs like [4438, 11460, 990]

. The model does math on those numbers. The English you typed was never seen.

Two extremes, both bad:

Tokens are the engineered middle: a fixed vocabulary (tens of thousands of entries) of common chunks that can assemble any text — including words the model has never encountered — by gluing pieces together. It's the compression that makes the whole thing tractable.

Every API provider, including Anthropic, prices per token — and counts both directions:

This is why costs surprise people:

Your bill ≈ (input tokens × input price) + (output tokens × output price)
            └── prompt + history + docs + tools      └── the model's reply

Say input is priced at $3 per million tokens and output at $15 per million (illustrative — check current rates). You send a 1,000-token prompt and get a 500-token answer:

Tiny — until you multiply by thousands of calls, or let conversation history balloon each call's input to 20,000 tokens. That's where bills come from: not one expensive call, but token count × call count.

How many tokens is a typical page of text?

Roughly 500–800 tokens per page of prose, but it varies with formatting and vocabulary.

Why do code and JSON sometimes cost more tokens than they look?

Symbols, indentation, and braces each tokenize separately, so structured text can be token-dense relative to its character count.

Does the system prompt count?

Yes. The system prompt, tool definitions, and any retrieved context are all input tokens you pay for on every call.

Is there a way to avoid resending the same big context every time?

Yes — prompt caching lets you reuse a stable prefix at a fraction of the cost.

I'm doing a whole series taking Claude apart piece by piece — video + written version of each — at The Stack Underflow. The full written companion to this one, plus the rest of the series, lives at thestackunderflow.com/tutorials.

source & further reading

dev.to — original article Building My AI SaaS Developer Portfolio 🚀 The Hidden Cost of the AI Hype Your AI-tool usage is invisible. Here are 4 tiny local tools to see it.

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-llm-tokens-work-and-…

Read original on dev.to → dev.to/thestackunderflow_3b3b3b6/how-llm-tokens-…

mentioned entities

Claude

Anthropic

The Stack Underflow

metadata

slughow-llm-tokens-work-and-why-they-explain-your-ai-bill

topic#large-language-models

secondary2 topics

sentimentneutral

canonicaldev.to

navigation

← prevThis $79 Lifetime Deal Replaces …

next →I built an open-source crypto tr…

── more in #large-language-models 4 stories · sorted by recency

cryptobriefing.com · 25 Jun · #large-language-models

Anthropic faces backlash over Fable 5’s US identity verification requirements

cryptobriefing.com · 25 Jun · #large-language-models

US engages in high-stakes talks with India to regulate Anthropic AI models

tabularis.dev · 25 Jun · #large-language-models

Tabularis: Open-source desktop SQL client your AI agent can use

nypost.com · 25 Jun · #large-language-models

Anthropic accuses Alibaba of campaign to ‘brazenly’ and ‘illicitly’ rip off its AI capabilities

── more on @claude 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required