Large language models like ChatGPT process text as tokens, not words, a distinction a July 2026 TowardsAI explainer says determines model cost, context-window limits, and prompt behavior. According to OpenAI's own token documentation, English text runs about one token per four characters, so a token can be a whole word, a word fragment, punctuation, a number, or an emoji. For practitioners, this means API billing and context-window budgets are set by token counts, not word counts, and non-English or code-heavy text can consume tokens far faster than plain English. The piece illustrates the idea with a LEGO-brick analogy, showing how a sentence like "Machine learning is amazing." splits into reusable subword pieces so models can handle rare or unseen words without exploding vocabulary size.
Show HN: Gavio: open-source interceptor pipeline for production LLM applications