{"slug": "how-ai-works", "title": "How AI Works", "summary": "Large language models like GPT-5.4 and Claude Opus 4.6 do not think, understand, or learn like humans — they are statistical tools that process language by converting text into numerical tokens and embeddings. These systems generate coherent language by applying learned mathematical patterns, not by possessing intention or comprehension, making it essential to recognize them as sophisticated pattern matchers rather than miniature minds.", "body_md": "# How large language models actually work, and why they are not miniature humans\n\nLarge language models such as GPT‑5.4, Claude Opus 4.6, and DeepSeek R1 are now everyday tools. Yet the way they work is often misunderstood.\n\nWe misunderstand AI because we mistake fluency for thought. When a system produces coherent language, we instinctively assume intention, understanding and agency behind it. This article explains why that instinct misleads us, and why clarity about what these systems are — and are not — is essential for using them wisely.\n\nLLMs do not think, they do not understand, and they do not learn in any human sense. What they do is process language at scale.\n\nThis article explains how that works, what is inside these systems, and why their behaviour can look intelligent even when no intelligence is present.\n\nThe key to understanding these systems is to see them as statistical tools, not miniature minds.\n\n# How an LLM processes what you type\n\n## Tokens\n\nAn LLM begins by breaking what you type into tokens. A token is a small unit of text. It may be a whole word, part of a word, or punctuation. Tokens are not ideas or concepts. They are fragments chosen because they appear often in text and can be handled efficiently by the model.\n\nEach token has a unique number. The token for \"king\" might be 99. The token for \"queen\" might be 24521. At this stage, your prompt is turned into the same token numbers for the same text.\n\nTokens turn your text into numbers the model can work with.\n\nTokens on their own do not help the model process language. A token ID like 99 or 24521 is just a label. The model cannot compute with these integers because they do not contain any information about how the token is used or how it relates to other tokens.\n\nTo make computation possible, the model converts each token ID into a list of numbers. This list is called an embedding. It places the token as a point in a space where the model can perform computation. Think of the points in the space as the rooms of a house.\n\nThese lists are not chosen by hand. They are learned during training. As the model trains, the lists are adjusted so that tokens used in similar contexts move closer together in this space (like adjacent rooms in a house). They move closer because doing so reduces the model’s prediction error. This proximity is not meaning in a human sense. It is a statistical structure that allows the model to compute relationships between tokens.\n\nTwo lists that are close together represents statistical similarity of how that token was used in the training data.\n\n## Lists of numbers represent a point in space\n\nThe model uses each token number to look up a list of numbers that represents that token. These lists are learned during training. No one chooses them by hand.\n\nFor the token \"king\", the list might look like:\n\n[0.12, 0.44, 0.91, ..., 0.03]\n\nThis list is a position in a mathematical space. You can think of each number as a step along a corridor. You take the first step, and go through door number 12, then the next (door 44), and so on until you reach a final position (door 3). That position is the model's internal representation of the token.\n\nFor the token \"queen\", the list might be:\n\n[0.12, 0.44, 0.91, ..., 0.02]\n\nThe final step is slightly different, and the final position is close to the position for \"king\" (door 2 for \"queen\", door 3 for \"king\").\n\nThis closeness reflects how often the two words appear in similar contexts in the training data.\n\nThese lists of numbers are part of the model’s parameters.\n\nThe rest of the parameters determine how these positions influence one another as the model processes text. They shape how patterns combine, how relationships are detected and how the model transforms one set of token positions into the next. These parameters do not add meaning. They provide the machinery that lets the model apply statistical patterns to the text you give it.\n\nThese parameters set up the internal machinery the model uses to process and transform text.\n\n# Moving about the space\n\nTo show how the model captures patterns, imagine a simple three‑number space:\n\nking = [10, 7, 3] man = [ 6, 2, 1]\n\nqueen = [10, 7, 6] woman = [ 6, 2, 4]\n\nIf we subtract man from king, we get:\n\n\\([10−6, 7−2, 3−1] = [4, 5, 2]\\)\n\nThis is the direction from \"man\" to \"king\". If we then add \"woman\":\n\n\\([4, 5, 2] + [6, 2, 4] = [10, 7, 6]\\)\n\nThis lands us at the position for \"queen\".\n\nThe model has captured a pattern. The statistical difference between \"king\" and \"man\" resembles the difference between \"queen\" and \"woman\".\n\nThe model does not know why. The LLM's program has only calculated that these differences behave in similar ways across the training data.\n\n# Why this works\n\nThis works because \"king\" and \"man\" differ in consistent ways across the training data. \"Queen\" and \"woman\" differ in similar ways. The model adjusts its internal numbers so that these differences become similar directions in the space. The model has found a pattern and matched it.\n\nHumans then interpret this similarity as understanding.\n\nThe model reflects these similarities because they appear consistently across the text it was trained on.\n\n# It is all in the training data\n\nText contains stable patterns. These patterns describe roles, relationships, contrast, categories, analogies and grammatical structure.\n\nDuring training, the model adjusts itself so that tokens used in similar\ncontexts end up near one another, and tokens used in contrasting contexts end\nup separated in *consistent* ways.\n\nThis produces directions, distances, clusters and angles. These geometric features are the model's internal map of the statistical structure of language. Because language has structure, the model can represent it mathematically.\n\nThe model can represent these structures only because language itself contains stable patterns.\n\n## The human role in meaning\n\nThe model’s internal space is not a map of concepts. It is a map of statistical regularities. The structure becomes meaningful only when a human interprets it. We project categories, intentions and explanations onto patterns that were never designed to carry them. The model provides form; we provide significance. This distinction is not only philosophical, it is the boundary between what the system can do and what we imagine it can do.\n\n# We supply the intelligence\n\nThe distance between \"king\" and \"man\" is a statistical outcome. The distance between \"queen\" and \"woman\" is another. These two outcomes are similar. That similarity is the pattern the model has detected.\n\nThe model is not reasoning. It does not understand. It does not manipulate ideas. It follows the geometry that training has produced. If a direction has been useful for predicting text in the past, the model will use it again.\n\nThe geometry captures statistical qualities of human text. These include:\n\n- similarity of tone\n- proximity of commonly associated words\n- regular contrasts between categories\n- recurring relationships between ideas\n- typical structures of phrasing\n\nThe model does not reason about these qualities. It only reflects the statistics of its training data.\n\nTokens that appear in similar contexts end up close together. Tokens that contrast end up separated. Groups of related tokens form clusters. Repeated differences become directions. Angles reflect how often patterns co‑occur or diverge.\n\nFor example, words like \"cat\", \"dog\" and \"hamster\" end up near one another because they appear in similar kinds of sentences.\n\nWhen the model generates text, it moves through this space by following these patterns. Humans then read the output and recognise tone, relatedness, contrast and structure.\n\nThe model is not producing meaning. It is reproducing geometry. We are the ones interpreting that geometry as meaning.\n\nIt is us that supply the I in AI.\n\nThe model provides structure, but humans provide interpretation.\n\nThis geometric structure is simply a way of organising statistical patterns so the model can use them efficiently.\n\nTo understand how this internal space is created, we need to look at the billions of parameters inside the model.\n\n# What is in the billions of parameters\n\nTo understand how the model builds and moves through its geometric space, it helps to look at what that is based on.\n\nAfter training, an LLM contains billions of parameters. These parameters are numerical values that shape how the model transforms text. Together they define the structure of the internal space: the directions that matter, the distances between tokens, the clusters that form, and the angles that represent relationships.\n\nWhen the model processes a prompt, it moves through this space by following the statistical structure represented in these parameters.\n\nDeepSeek R1 has 671 billion parameters. ChatGPT‑5.4 may have over 2 trillion. More parameters mean greater capacity to represent and combine statistical patterns.\n\nMore parameters increase capacity, not understanding.\n\n## Parameters do not contain knowledge\n\nThe billions of parameters inside an LLM are often described as if they contain knowledge. They do not. They represent statistical consistencies extracted from large amounts of text.\n\nDuring training, the model adjusts its parameters to capture patterns in how language is used. Humans use language in standard ways, directed by grammar, style, topic associations and the common ways that ideas appear together.\n\nThe parameters form a space where patterns that frequently co‑occur in text end up close to one another. This allows the model to produce text that resembles human writing. It does not give the model the ability to reason or understand.\n\nFor example, if the training data contains mixed statements about a historical date, the model may confidently produce the wrong one because it is reflecting the statistical blend it has seen.\n\nParameters cannot store precise facts. They store tendencies, associations and relationships. If a fact appears often and consistently in the training data, the model may reproduce it. If the data is mixed or inconsistent, the model reflects that uncertainty. This is why LLMs can produce confident errors. They are not recalling facts. They are replaying patterns.\n\nThese parameters are shaped during training, which is the process that gives the model its statistical structure.\n\nThe model reflects the patterns in its data, not stored facts or understanding.\n\n# What training actually does\n\nTraining is repeated large‑scale error‑correction. The model predicts the next token, checks whether it was right, and adjusts its parameters to reduce the difference. This cycle repeats billions of times across vast amounts of text. The result is a system that becomes increasingly accurate at predicting what comes next.\n\nThe model does not form concepts. It does not build a picture of the world. It does not develop intentions or goals. It becomes more accurate at predicting the next token.\n\nFine‑tuning and alignment add further adjustments. These make the model follow instructions more reliably and avoid harmful output. They do not create understanding. They refine the statistical patterns the model uses.\n\nTraining shapes the parameters so the model becomes better at predicting what comes next.\n\n# Why this is not human learning\n\nHuman learning draws on perception, memory, experience and intention. Humans form abstractions, build mental models and develop goals. Human learning is grounded in the body and the world.\n\nLLM training is none of these things. It is a mathematical optimisation process. The model does not know what it is doing. It does not know that it is doing anything at all.\n\nThe model’s improvement is mechanical, not cognitive.\n\n# Is the output a simulation of intelligence?\n\nLLM output can appear intelligent because it resembles the writing of people who were thinking when they produced the original text. If you ask for advice, the model generates text that resembles advice. If you ask for an explanation, it generates text that resembles an explanation. The appearance of reasoning comes from the patterns in the training data, not from any understanding in the model. The model produces sequences that look thoughtful because thoughtful sequences are common in the text it has seen.\n\nThe resemblance is superficial. The model does not understand the text it produces. It does not know whether a statement is true or false. It only reflects that certain sequences of tokens tend to follow others.\n\nThe appearance of intelligence comes from the patterns in human writing, not from the model itself.\n\n# Are humans interpreting the output as intelligent\n\nHumans are skilled at projecting meaning onto language. When we read coherent text, we assume intention behind it. We assume a mind. We assume agency. This is a natural response, but it can mislead us when dealing with LLMs.\n\nThe model does not intend anything. It generates plausible continuations of text. The sense of intelligence comes from the reader, not the machine. The machine provides form. The human provides interpretation.\n\nOur instinct to attribute intention makes the output seem smarter than it is.\n\nThis distinction matters because it prevents us from assuming abilities the model does not have.\n\n# What this means for us\n\nAn LLM is possible because we can statistically model features of language that matter to humans.\n\nLLMs are powerful tools for generating language. They are not thinking machines. Their strengths lie in pattern reproduction. Their weaknesses lie in the absence of understanding. They can assist with tasks that depend on language, but they cannot replace human judgement.\n\nA clear grasp of how these systems work helps avoid confusion. It prevents anthropomorphism. It supports responsible use. It keeps expectations grounded in what the technology can actually do, rather than what it appears to do.\n\nThe more plainly we describe these systems, the easier it becomes to use them well and to avoid treating them as something they are not.\n\nIn the end, an LLM is a system that maps patterns in language and reproduces them at scale. It does not think or understand. It follows geometry shaped by training, and we interpret that geometry as meaning. Knowing this helps us use these systems effectively, without expecting them to behave like people or to possess abilities they do not have.\n\nAll of this leads to a simple conclusion: understanding these limits helps us use LLMs effectively and responsibly.\n\n## Why clarity matters\n\nLLMs are powerful because language has structure, not because the systems understand it. They reproduce patterns we find meaningful, and we supply the meaning. When we keep that distinction clear, we avoid treating statistical machinery as a mind, and we avoid outsourcing judgement to a system that has none. Practical wisdom begins with seeing these systems as they are, not as we are tempted to imagine them.\n\n# Related Work\n\n[A clear explanation of what AI is—and is not—cutting through hype to define its real capabilities and limits.](what-ai-is.html)[Guidance on using AI safely and effectively, grounded in recent examples of misuse and emerging best practices.](how-to-use.html)[A framework for evaluating claims made about AI systems, focusing on evidence, capability, and verifiable performance.](evaluate-ai-claims.html)\n\n**If this piece was useful**, you’ll appreciate the free Phroneses newsletter — clear thinking on engineering leadership, organisational clarity, and reliable systems. Practical, honest, and built for people who care about doing the work well.\n\nI work with leaders and teams on clarity, capability, and momentum.\n[Work with me →](/pages/services.html)\n\n# Table of Contents\n\n[How large language models actually work, and why they are not miniature humans](#how-large-language-models-actually-work-and-why-they-are-not-miniature-humans)[How an LLM processes what you type](#how-an-llm-processes-what-you-type)[Moving about the space](#moving-about-the-space)[Why this works](#why-this-works)[It is all in the training data](#it-is-all-in-the-training-data)[We supply the intelligence](#we-supply-the-intelligence)[What is in the billions of parameters](#what-is-in-the-billions-of-parameters)[What training actually does](#what-training-actually-does)[Why this is not human learning](#why-this-is-not-human-learning)[Is the output a simulation of intelligence?](#is-the-output-a-simulation-of-intelligence)[Are humans interpreting the output as intelligent](#are-humans-interpreting-the-output-as-intelligent)[What this means for us](#what-this-means-for-us)[Related Work](#related-work)[Table of Contents](#table-of-contents)", "url": "https://wpnews.pro/news/how-ai-works", "canonical_source": "https://phroneses.com/articles/foundations/notes/how-ai-works.html", "published_at": "2026-05-06 00:00:00+00:00", "updated_at": "2026-05-27 14:58:45.694697+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "natural-language-processing", "generative-ai", "ai-ethics"], "entities": ["GPT-5.4", "Claude Opus 4.6", "DeepSeek R1"], "alternates": {"html": "https://wpnews.pro/news/how-ai-works", "markdown": "https://wpnews.pro/news/how-ai-works.md", "text": "https://wpnews.pro/news/how-ai-works.txt", "jsonld": "https://wpnews.pro/news/how-ai-works.jsonld"}}