22:35
2026-07-03
chenliu-1996.github.io
large-language-models
Dispersion loss counteracts embedding condensation in small language models
Researchers found that smaller language models suffer from 'embedding condensation,' where token embeddings collapse into a narrow cone, reducing expressivity. They introduced a dispersion loss to couโฆ