18:30
2026-06-30
hello-fri-end.github.io
large-language-models
Why do transformers have outliers?
Transformer models develop outlier channels—feature dimensions with unusually large values in weights and activations—due to the softmax normalization in attention layers, which forces tokens to assig…