How AI Works

Large language models like GPT-5.4 and Claude Opus 4.6 do not think, understand, or learn like humans — they are statistical tools that process language by converting text into numerical tokens and embeddings. These systems generate coherent language by applying learned mathematical patterns, not by possessing intention or comprehension, making it essential to recognize them as sophisticated pattern matchers rather than miniature minds.

How large language models actually work, and why they are not miniature humans Large language models such as GPT‑5.4, Claude Opus 4.6, and DeepSeek R1 are now everyday tools. Yet the way they work is often misunderstood. We misunderstand AI because we mistake fluency for thought. When a system produces coherent language, we instinctively assume intention, understanding and agency behind it. This article explains why that instinct misleads us, and why clarity about what these systems are — and are not — is essential for using them wisely. LLMs do not think, they do not understand, and they do not learn in any human sense. What they do is process language at scale. This article explains how that works, what is inside these systems, and why their behaviour can look intelligent even when no intelligence is present. The key to understanding these systems is to see them as statistical tools, not miniature minds. How an LLM processes what you type Tokens An LLM begins by breaking what you type into tokens. A token is a small unit of text. It may be a whole word, part of a word, or punctuation. Tokens are not ideas or concepts. They are fragments chosen because they appear often in text and can be handled efficiently by the model. Each token has a unique number. The token for "king" might be 99. The token for "queen" might be 24521. At this stage, your prompt is turned into the same token numbers for the same text. Tokens turn your text into numbers the model can work with. Tokens on their own do not help the model process language. A token ID like 99 or 24521 is just a label. The model cannot compute with these integers because they do not contain any information about how the token is used or how it relates to other tokens. To make computation possible, the model converts each token ID into a list of numbers. This list is called an embedding. It places the token as a point in a space where the model can perform computation. Think of the points in the space as the rooms of a house. These lists are not chosen by hand. They are learned during training. As the model trains, the lists are adjusted so that tokens used in similar contexts move closer together in this space like adjacent rooms in a house . They move closer because doing so reduces the model’s prediction error. This proximity is not meaning in a human sense. It is a statistical structure that allows the model to compute relationships between tokens. Two lists that are close together represents statistical similarity of how that token was used in the training data. Lists of numbers represent a point in space The model uses each token number to look up a list of numbers that represents that token. These lists are learned during training. No one chooses them by hand. For the token "king", the list might look like: 0.12, 0.44, 0.91, ..., 0.03 This list is a position in a mathematical space. You can think of each number as a step along a corridor. You take the first step, and go through door number 12, then the next door 44 , and so on until you reach a final position door 3 . That position is the model's internal representation of the token. For the token "queen", the list might be: 0.12, 0.44, 0.91, ..., 0.02 The final step is slightly different, and the final position is close to the position for "king" door 2 for "queen", door 3 for "king" . This closeness reflects how often the two words appear in similar contexts in the training data. These lists of numbers are part of the model’s parameters. The rest of the parameters determine how these positions influence one another as the model processes text. They shape how patterns combine, how relationships are detected and how the model transforms one set of token positions into the next. These parameters do not add meaning. They provide the machinery that lets the model apply statistical patterns to the text you give it. These parameters set up the internal machinery the model uses to process and transform text. Moving about the space To show how the model captures patterns, imagine a simple three‑number space: king = 10, 7, 3 man = 6, 2, 1 queen = 10, 7, 6 woman = 6, 2, 4 If we subtract man from king, we get: \ 10−6, 7−2, 3−1 = 4, 5, 2 \ This is the direction from "man" to "king". If we then add "woman": \ 4, 5, 2 + 6, 2, 4 = 10, 7, 6 \ This lands us at the position for "queen". The model has captured a pattern. The statistical difference between "king" and "man" resembles the difference between "queen" and "woman". The model does not know why. The LLM's program has only calculated that these differences behave in similar ways across the training data. Why this works This works because "king" and "man" differ in consistent ways across the training data. "Queen" and "woman" differ in similar ways. The model adjusts its internal numbers so that these differences become similar directions in the space. The model has found a pattern and matched it. Humans then interpret this similarity as understanding. The model reflects these similarities because they appear consistently across the text it was trained on. It is all in the training data Text contains stable patterns. These patterns describe roles, relationships, contrast, categories, analogies and grammatical structure. During training, the model adjusts itself so that tokens used in similar contexts end up near one another, and tokens used in contrasting contexts end up separated in consistent ways. This produces directions, distances, clusters and angles. These geometric features are the model's internal map of the statistical structure of language. Because language has structure, the model can represent it mathematically. The model can represent these structures only because language itself contains stable patterns. The human role in meaning The model’s internal space is not a map of concepts. It is a map of statistical regularities. The structure becomes meaningful only when a human interprets it. We project categories, intentions and explanations onto patterns that were never designed to carry them. The model provides form; we provide significance. This distinction is not only philosophical, it is the boundary between what the system can do and what we imagine it can do. We supply the intelligence The distance between "king" and "man" is a statistical outcome. The distance between "queen" and "woman" is another. These two outcomes are similar. That similarity is the pattern the model has detected. The model is not reasoning. It does not understand. It does not manipulate ideas. It follows the geometry that training has produced. If a direction has been useful for predicting text in the past, the model will use it again. The geometry captures statistical qualities of human text. These include: - similarity of tone - proximity of commonly associated words - regular contrasts between categories - recurring relationships between ideas - typical structures of phrasing The model does not reason about these qualities. It only reflects the statistics of its training data. Tokens that appear in similar contexts end up close together. Tokens that contrast end up separated. Groups of related tokens form clusters. Repeated differences become directions. Angles reflect how often patterns co‑occur or diverge. For example, words like "cat", "dog" and "hamster" end up near one another because they appear in similar kinds of sentences. When the model generates text, it moves through this space by following these patterns. Humans then read the output and recognise tone, relatedness, contrast and structure. The model is not producing meaning. It is reproducing geometry. We are the ones interpreting that geometry as meaning. It is us that supply the I in AI. The model provides structure, but humans provide interpretation. This geometric structure is simply a way of organising statistical patterns so the model can use them efficiently. To understand how this internal space is created, we need to look at the billions of parameters inside the model. What is in the billions of parameters To understand how the model builds and moves through its geometric space, it helps to look at what that is based on. After training, an LLM contains billions of parameters. These parameters are numerical values that shape how the model transforms text. Together they define the structure of the internal space: the directions that matter, the distances between tokens, the clusters that form, and the angles that represent relationships. When the model processes a prompt, it moves through this space by following the statistical structure represented in these parameters. DeepSeek R1 has 671 billion parameters. ChatGPT‑5.4 may have over 2 trillion. More parameters mean greater capacity to represent and combine statistical patterns. More parameters increase capacity, not understanding. Parameters do not contain knowledge The billions of parameters inside an LLM are often described as if they contain knowledge. They do not. They represent statistical consistencies extracted from large amounts of text. During training, the model adjusts its parameters to capture patterns in how language is used. Humans use language in standard ways, directed by grammar, style, topic associations and the common ways that ideas appear together. The parameters form a space where patterns that frequently co‑occur in text end up close to one another. This allows the model to produce text that resembles human writing. It does not give the model the ability to reason or understand. For example, if the training data contains mixed statements about a historical date, the model may confidently produce the wrong one because it is reflecting the statistical blend it has seen. Parameters cannot store precise facts. They store tendencies, associations and relationships. If a fact appears often and consistently in the training data, the model may reproduce it. If the data is mixed or inconsistent, the model reflects that uncertainty. This is why LLMs can produce confident errors. They are not recalling facts. They are replaying patterns. These parameters are shaped during training, which is the process that gives the model its statistical structure. The model reflects the patterns in its data, not stored facts or understanding. What training actually does Training is repeated large‑scale error‑correction. The model predicts the next token, checks whether it was right, and adjusts its parameters to reduce the difference. This cycle repeats billions of times across vast amounts of text. The result is a system that becomes increasingly accurate at predicting what comes next. The model does not form concepts. It does not build a picture of the world. It does not develop intentions or goals. It becomes more accurate at predicting the next token. Fine‑tuning and alignment add further adjustments. These make the model follow instructions more reliably and avoid harmful output. They do not create understanding. They refine the statistical patterns the model uses. Training shapes the parameters so the model becomes better at predicting what comes next. Why this is not human learning Human learning draws on perception, memory, experience and intention. Humans form abstractions, build mental models and develop goals. Human learning is grounded in the body and the world. LLM training is none of these things. It is a mathematical optimisation process. The model does not know what it is doing. It does not know that it is doing anything at all. The model’s improvement is mechanical, not cognitive. Is the output a simulation of intelligence? LLM output can appear intelligent because it resembles the writing of people who were thinking when they produced the original text. If you ask for advice, the model generates text that resembles advice. If you ask for an explanation, it generates text that resembles an explanation. The appearance of reasoning comes from the patterns in the training data, not from any understanding in the model. The model produces sequences that look thoughtful because thoughtful sequences are common in the text it has seen. The resemblance is superficial. The model does not understand the text it produces. It does not know whether a statement is true or false. It only reflects that certain sequences of tokens tend to follow others. The appearance of intelligence comes from the patterns in human writing, not from the model itself. Are humans interpreting the output as intelligent Humans are skilled at projecting meaning onto language. When we read coherent text, we assume intention behind it. We assume a mind. We assume agency. This is a natural response, but it can mislead us when dealing with LLMs. The model does not intend anything. It generates plausible continuations of text. The sense of intelligence comes from the reader, not the machine. The machine provides form. The human provides interpretation. Our instinct to attribute intention makes the output seem smarter than it is. This distinction matters because it prevents us from assuming abilities the model does not have. What this means for us An LLM is possible because we can statistically model features of language that matter to humans. LLMs are powerful tools for generating language. They are not thinking machines. Their strengths lie in pattern reproduction. Their weaknesses lie in the absence of understanding. They can assist with tasks that depend on language, but they cannot replace human judgement. A clear grasp of how these systems work helps avoid confusion. It prevents anthropomorphism. It supports responsible use. It keeps expectations grounded in what the technology can actually do, rather than what it appears to do. The more plainly we describe these systems, the easier it becomes to use them well and to avoid treating them as something they are not. In the end, an LLM is a system that maps patterns in language and reproduces them at scale. It does not think or understand. It follows geometry shaped by training, and we interpret that geometry as meaning. Knowing this helps us use these systems effectively, without expecting them to behave like people or to possess abilities they do not have. All of this leads to a simple conclusion: understanding these limits helps us use LLMs effectively and responsibly. Why clarity matters LLMs are powerful because language has structure, not because the systems understand it. They reproduce patterns we find meaningful, and we supply the meaning. When we keep that distinction clear, we avoid treating statistical machinery as a mind, and we avoid outsourcing judgement to a system that has none. Practical wisdom begins with seeing these systems as they are, not as we are tempted to imagine them. Related Work A clear explanation of what AI is—and is not—cutting through hype to define its real capabilities and limits. what-ai-is.html Guidance on using AI safely and effectively, grounded in recent examples of misuse and emerging best practices. how-to-use.html A framework for evaluating claims made about AI systems, focusing on evidence, capability, and verifiable performance. evaluate-ai-claims.html If this piece was useful , you’ll appreciate the free Phroneses newsletter — clear thinking on engineering leadership, organisational clarity, and reliable systems. Practical, honest, and built for people who care about doing the work well. I work with leaders and teams on clarity, capability, and momentum. Work with me → /pages/services.html Table of Contents How large language models actually work, and why they are not miniature humans how-large-language-models-actually-work-and-why-they-are-not-miniature-humans How an LLM processes what you type how-an-llm-processes-what-you-type Moving about the space moving-about-the-space Why this works why-this-works It is all in the training data it-is-all-in-the-training-data We supply the intelligence we-supply-the-intelligence What is in the billions of parameters what-is-in-the-billions-of-parameters What training actually does what-training-actually-does Why this is not human learning why-this-is-not-human-learning Is the output a simulation of intelligence? is-the-output-a-simulation-of-intelligence Are humans interpreting the output as intelligent are-humans-interpreting-the-output-as-intelligent What this means for us what-this-means-for-us Related Work related-work Table of Contents table-of-contents