04:00
2026-06-19
arxiv.org
machine-learning
Weibull Weight-Scale Parameter Evolution under AdamW Training Dynamics
Researchers derived a three-force decomposition of AdamW training dynamics explaining Weibull weight-scale parameter evolution in transformers, finding alignment force dominates growth. A spline displβ¦