{"slug": "spectral-asymptotics-of-neural-network-loss-landscapes-an-exact-decomposition-of", "title": "Spectral Asymptotics of Neural Network Loss Landscapes: An Exact Decomposition of the Curvature Exponent", "summary": "A new study proves the Spectral Alignment Decomposition, which explains why the curvature exponent $\\alpha$ — governing how Hessian eigenvalues scale with gradient singular values — varies across neural network layers, with $\\alpha \\approx 2$ for convolutions, $\\approx 1$ for transformer attention, and $< 1$ for MLP up-projections. The decomposition reduces the variation to a geometric question about alignment between Kronecker factor eigenbases and gradient singular directions, and yields a spectral transfer identity linking curvature exponent, effective gradient rank-decay, and Hessian decay exponent that predicts $s$ to ~2% median error across 93 layers with no free parameters. As a proof of concept, the researchers derive an architecture-adaptive preconditioner and show that Spectral Newton outperforms AdamW on vision benchmarks where $\\alpha \\approx 2$.", "body_md": "arXiv:2606.02596v1 Announce Type: new\nAbstract: The curvature exponent $\\alpha$ in $h_k \\propto \\sigma_k^\\alpha$ -- governing how Hessian eigenvalues scale with gradient singular values -- varies systematically across layer types ($\\alpha \\approx 2$ for convolutions, $\\approx 1$ for transformer attention, $< 1$ for MLP up-projections). Why? We prove the Spectral Alignment Decomposition: $\\alpha = 2 + d\\log\\Phi_k / d\\log\\sigma_k$, where $\\Phi_k$ measures alignment between Kronecker factor eigenbases and gradient singular directions. This reduces \"why does $\\alpha$ vary?\" to a geometric question we answer for LayerNorm, residual connections, and softmax heads. The decomposition implies a spectral transfer identity $s = \\alpha\\gamma$ linking curvature exponent, effective gradient rank-decay $\\gamma$, and Hessian decay exponent $s$. The identity is algebraic; its empirical content is that $\\alpha$ and $\\gamma$, fit on independent data (HVPs vs. SVD), recover $s$ to ~2% median error across 93 layers, five architectures, and three datasets -- with no free parameters. A zeta-function bound on participation ratio shows curvature concentrates onto effectively one direction per layer. As a proof of concept, we derive the architecture-adaptive preconditioner $T(\\sigma;\\alpha)$ and show that Spectral Newton -- implementing $T$ in the gradient singular basis -- outperforms AdamW on vision benchmarks where $\\alpha \\approx 2$.", "url": "https://wpnews.pro/news/spectral-asymptotics-of-neural-network-loss-landscapes-an-exact-decomposition-of", "canonical_source": "https://arxiv.org/abs/2606.02596", "published_at": "2026-06-03 04:00:00+00:00", "updated_at": "2026-06-03 04:02:52.488297+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "ai-research", "computer-vision", "natural-language-processing"], "entities": ["Spectral Alignment Decomposition", "Spectral Newton", "AdamW", "LayerNorm", "Kronecker factor"], "alternates": {"html": "https://wpnews.pro/news/spectral-asymptotics-of-neural-network-loss-landscapes-an-exact-decomposition-of", "markdown": "https://wpnews.pro/news/spectral-asymptotics-of-neural-network-loss-landscapes-an-exact-decomposition-of.md", "text": "https://wpnews.pro/news/spectral-asymptotics-of-neural-network-loss-landscapes-an-exact-decomposition-of.txt", "jsonld": "https://wpnews.pro/news/spectral-asymptotics-of-neural-network-loss-landscapes-an-exact-decomposition-of.jsonld"}}