arXiv:2606.05219v1 Announce Type: new Abstract: Recent analyses of multi-pathway Deep Linear Networks use Gradient Flow to predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discrete Gradient Descent (GD) with a large step size tells a different story. We prove that single-path solutions are sharp minima, whereas distributing signals across pathways reduces sharpness by a factor that decreases with both the number of pathways and depth. Consequently, while early training reproduces the depth-driven symmetry breaking predicted by GF, oscillations at the Edge of Stability subsequently override this tendency and drive the network into a re-balancing phase, where signals redistribute across pathways. Together, these results clarify how depth shapes pathway competition and explain why large-step GD favors shared representations rather than persistent single-pathway dominance.
Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway
A new study shows that discrete Gradient Descent (GD) with a large step size restores symmetry in multi-pathway Deep Linear Networks, counteracting the "winner-takes-all" specialization predicted by Gradient Flow. Researchers proved that single-path solutions are sharp minima, while distributing signals across pathways reduces sharpness, causing oscillations at the Edge of Stability to override early symmetry breaking and drive signal redistribution. These findings explain why large-step GD favors shared representations over persistent single-pathway dominance, clarifying how depth shapes pathway competition.
Run your AI side-project on zahid.host
EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.