cd /news/machine-learning/do-value-vectors-in-deep-layers-need… · home topics machine-learning article
[ARTICLE · art-19928] src=arxiv.org pub= topic=machine-learning verified=true sentiment=↑ positive

Do Value Vectors in Deep Layers Need Context from the Residual Stream?

Researchers found that transformer-based language models perform better when deeper attention layers learn context-free value vectors that preserve original token information, rather than relying on the residual stream for context. The proposed Bank of Values (BoV) method stores these vectors as sparse lookup tables, eliminating the need for recomputation or persistent caching. In tests with 135M and 780M parameter models, BoV improved validation loss and benchmark scores while using less compute and memory than previous approaches.

read1 min publishedJun 3, 2026

arXiv:2606.02780v1 Announce Type: new Abstract: The success of the transformer architecture as the backbone of modern LLMs is in large part due to its use of attention layers. An attention layer follows the standard neural network paradigm: it takes the residual stream as input and thereby produces context-dependent query, key, and value vectors. However, we find that model performance meaningfully improves when deeper layers learn only a context-free value vector to preserve the original token information, without drawing on any context from the residual stream. When the model has access to this context-free value vector, adding back the context-dependent component provides little additional benefit for aggregate benchmark performance. Such context-free value vectors can be stored as sparse model parameters, eliminating the need to recompute or persistently cache these values. Through systematic ablations on the key design choices for such context-free value vectors, we propose Bank of Values (BoV), a new way of computing value vectors in attention by learning a lookup table of token-specific value vectors for each of the last third of layers. Across 135M and 780M models, BoV improves validation loss over standard attention and, at 780M, the average score across 21 benchmarks, matching the previous best method that adds token information to the value vector with less compute and memory.

── more in #machine-learning 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/do-value-vectors-in-…] indexed:0 read:1min 2026-06-03 ·