attention

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

18:30

2026-06-30

hello-fri-end.github.io

large-language-models

Why do transformers have outliers?

Transformer models develop outlier channels—feature dimensions with unusually large values in weights and activations—due to the softmax normalization in attention layers, which forces tokens to assig…

// co-occurs with top 3 entities

Transformer 1 Vision Transformer 1 softmax 1