# Query, Key, Values

> Source: <https://www.anup.io/query-key-values/>
> Published: 2026-06-01 05:36:35+00:00

# Query, Key, Values

[As part of my TIL series, building an intuition about Q, K, V]

A good way to understand **QKV** is this:

Attention is a soft lookup operation.

Given a token, the model asks:

“What information should I pull from the other tokens?”

Q, K and V are just three different projections of the same input token embeddings.

**The simplest mental model**

For each token, the model creates three vectors:

**Query**-> "What am I looking for?"** Key**-> "What do I contain/advertise?"** Value**-> "What information should I pass on if selected?"

So attention works like this:

- Compare a token’s
**Query** against every other token’s**Key**. - Turn those similarities into weights.
- Use those weights to take a weighted average of the
**Values**.

The formula is:

```
Attention(Q, K, V) = softmax(QKᵀ / √dₖ) V
```

Meaning:

```
similarity scores = QKᵀ
attention weights = softmax(similarity scores)
output = attention weights × V
```

**Concrete example**

Take the sentence:

```
The dog chased the ball because it was excited.
```

When processing the token **“it”**, the model needs to decide what **“it”** refers to.

For the token **“it”**:

```
Q_it = “I am looking for the thing this pronoun refers to”
```

Other tokens expose keys:

```
K_dog  = “I am an animal / possible subject”
K_ball = “I am an object / possible noun”
```

The model compares:

```
Q_it · K_dog
Q_it · K_ball
```

If `Q_it · K_dog`

is higher, then **“it” attends more strongly to “dog”**.

Then the output for **“it”** becomes a weighted mixture of the value vectors, especially:

```
V_dog
```

So the model enriches the representation of **“it”** with information from **“dog”**.

**Why separate Q, K and V?**

This is the key bit.

The model does **not** use the raw token embedding directly. It learns three different views of each token:

```
Q = XW_Q
K = XW_K
V = XW_V
```

Same input `X`

, different learned matrices.

Why?

Because “what I am looking for”, “how I should be matched”, and “what information I should contribute” are different jobs.

For example, the word **“bank”** might need to:

```
Q: look for context that disambiguates meaning
K: advertise that it is a noun, place, institution, river edge, etc.
V: contribute semantic content once selected
```

One embedding cannot do all of that cleanly. QKV gives the model specialised subspaces for matching and information transfer.

**The database analogy**

This is probably the most useful analogy:

```
Query  = search query
Key    = index / searchable metadata
Value  = retrieved content
```

Attention is like searching a database where every token is a record.

```
Token = record
Key   = searchable field
Value = payload
Query = search request from current token
```

The attention score says:

```
How relevant is this token’s key to my query?
```

The output says:

```
Give me the values from the most relevant tokens.
```

**The important correction**

People often say:

“Q asks a question, K answers it, V stores the answer.”

That is okay as a beginner analogy, but slightly misleading.

More accurately:

```
Q and K decide routing.
V carries content.
```

Q and K determine **where to attend**.

V determines **what information gets copied/mixed into the output**.

**One-line understanding**

*QKV attention is learned content-based routing: each token forms a query, matches it against other tokens’ keys, then pulls back a weighted blend of their values.*

No spam, no sharing to third party. Only you and me.
