Dispersion loss counteracts embedding condensation in small language models
Researchers found that smaller language models suffer from 'embedding condensation,' where token embeddings collapse into a narrow cone, reducing expressivity. They introduced a dispersion loss to cou…