04:00
2026-06-04
arxiv.org
machine-learning
Do Transformers Need Three Projections? Systematic Study of QKV Variants
A systematic study of query, key, and value (QKV) projection variants in Transformers found that sharing the key and value projections (Q-K=V) performs on par with or better than the standard three-prโฆ