{"slug": "query-key-values", "title": "Query, Key, Values", "summary": "The transformer attention mechanism uses three learned projections—Query, Key, and Value—to enable each token to selectively gather information from other tokens. The Query determines what the token is looking for, the Key advertises what each token contains, and the Value carries the content to be passed on if selected. This design allows the model to perform soft content-based routing, where attention weights computed from Query-Key similarity determine which Values are blended into the output.", "body_md": "# Query, Key, Values\n\n[As part of my TIL series, building an intuition about Q, K, V]\n\nA good way to understand **QKV** is this:\n\nAttention is a soft lookup operation.\n\nGiven a token, the model asks:\n\n“What information should I pull from the other tokens?”\n\nQ, K and V are just three different projections of the same input token embeddings.\n\n**The simplest mental model**\n\nFor each token, the model creates three vectors:\n\n**Query**-> \"What am I looking for?\"** Key**-> \"What do I contain/advertise?\"** Value**-> \"What information should I pass on if selected?\"\n\nSo attention works like this:\n\n- Compare a token’s\n**Query** against every other token’s**Key**. - Turn those similarities into weights.\n- Use those weights to take a weighted average of the\n**Values**.\n\nThe formula is:\n\n```\nAttention(Q, K, V) = softmax(QKᵀ / √dₖ) V\n```\n\nMeaning:\n\n```\nsimilarity scores = QKᵀ\nattention weights = softmax(similarity scores)\noutput = attention weights × V\n```\n\n**Concrete example**\n\nTake the sentence:\n\n```\nThe dog chased the ball because it was excited.\n```\n\nWhen processing the token **“it”**, the model needs to decide what **“it”** refers to.\n\nFor the token **“it”**:\n\n```\nQ_it = “I am looking for the thing this pronoun refers to”\n```\n\nOther tokens expose keys:\n\n```\nK_dog  = “I am an animal / possible subject”\nK_ball = “I am an object / possible noun”\n```\n\nThe model compares:\n\n```\nQ_it · K_dog\nQ_it · K_ball\n```\n\nIf `Q_it · K_dog`\n\nis higher, then **“it” attends more strongly to “dog”**.\n\nThen the output for **“it”** becomes a weighted mixture of the value vectors, especially:\n\n```\nV_dog\n```\n\nSo the model enriches the representation of **“it”** with information from **“dog”**.\n\n**Why separate Q, K and V?**\n\nThis is the key bit.\n\nThe model does **not** use the raw token embedding directly. It learns three different views of each token:\n\n```\nQ = XW_Q\nK = XW_K\nV = XW_V\n```\n\nSame input `X`\n\n, different learned matrices.\n\nWhy?\n\nBecause “what I am looking for”, “how I should be matched”, and “what information I should contribute” are different jobs.\n\nFor example, the word **“bank”** might need to:\n\n```\nQ: look for context that disambiguates meaning\nK: advertise that it is a noun, place, institution, river edge, etc.\nV: contribute semantic content once selected\n```\n\nOne embedding cannot do all of that cleanly. QKV gives the model specialised subspaces for matching and information transfer.\n\n**The database analogy**\n\nThis is probably the most useful analogy:\n\n```\nQuery  = search query\nKey    = index / searchable metadata\nValue  = retrieved content\n```\n\nAttention is like searching a database where every token is a record.\n\n```\nToken = record\nKey   = searchable field\nValue = payload\nQuery = search request from current token\n```\n\nThe attention score says:\n\n```\nHow relevant is this token’s key to my query?\n```\n\nThe output says:\n\n```\nGive me the values from the most relevant tokens.\n```\n\n**The important correction**\n\nPeople often say:\n\n“Q asks a question, K answers it, V stores the answer.”\n\nThat is okay as a beginner analogy, but slightly misleading.\n\nMore accurately:\n\n```\nQ and K decide routing.\nV carries content.\n```\n\nQ and K determine **where to attend**.\n\nV determines **what information gets copied/mixed into the output**.\n\n**One-line understanding**\n\n*QKV attention is learned content-based routing: each token forms a query, matches it against other tokens’ keys, then pulls back a weighted blend of their values.*\n\nNo spam, no sharing to third party. Only you and me.", "url": "https://wpnews.pro/news/query-key-values", "canonical_source": "https://www.anup.io/query-key-values/", "published_at": "2026-06-01 05:36:35+00:00", "updated_at": "2026-06-21 18:05:22.595114+00:00", "lang": "en", "topics": ["large-language-models", "machine-learning", "neural-networks", "natural-language-processing", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/query-key-values", "markdown": "https://wpnews.pro/news/query-key-values.md", "text": "https://wpnews.pro/news/query-key-values.txt", "jsonld": "https://wpnews.pro/news/query-key-values.jsonld"}}