Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

wpnews.pro

cd /news/machine-learning/gradient-descent-with-large-step-siz… · home › topics › machine-learning › article

[ARTICLE · art-22202] src=arxiv.org pub=2026-06-05T04:00Z topic=machine-learning verified=true sentiment=· neutral

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

A new study shows that discrete Gradient Descent (GD) with a large step size restores symmetry in multi-pathway Deep Linear Networks, counteracting the "winner-takes-all" specialization predicted by Gradient Flow. Researchers proved that single-path solutions are sharp minima, while distributing signals across pathways reduces sharpness, causing oscillations at the Edge of Stability to override early symmetry breaking and drive signal redistribution. These findings explain why large-step GD favors shared representations over persistent single-pathway dominance, clarifying how depth shapes pathway competition.

read1 min publishedJun 5, 2026

arXiv:2606.05219v1 Announce Type: new Abstract: Recent analyses of multi-pathway Deep Linear Networks use Gradient Flow to predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discrete Gradient Descent (GD) with a large step size tells a different story. We prove that single-path solutions are sharp minima, whereas distributing signals across pathways reduces sharpness by a factor that decreases with both the number of pathways and depth. Consequently, while early training reproduces the depth-driven symmetry breaking predicted by GF, oscillations at the Edge of Stability subsequently override this tendency and drive the network into a re-balancing phase, where signals redistribute across pathways. Together, these results clarify how depth shapes pathway competition and explain why large-step GD favors shared representations rather than persistent single-pathway dominance.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/gradient-descent-with-la…

Read original on arxiv.org → arxiv.org/abs/2606.05219

mentioned entities

Gradient Descent

Deep Linear Networks

Edge of Stability

metadata

sluggradient-descent-with-large-step-size-restores-symmetry-in-deep-linear-networks

topic#machine-learning

secondary2 topics

sentimentneutral

langen

canonicalarxiv.org

navigation

← prevThe Arms Dealer’s Nintendo 64 Wa…

next →New infosec products of the week…

── more in #machine-learning 4 stories · sorted by recency

arxiv.org · 5 Jun · #machine-learning

NIV: Neural Axis Variations for Variable Font Generation

arxiv.org · 5 Jun · #machine-learning

LightVesselNet: An Ultra-Lightweight Sub-100K Parameter Network for Retinal Blood Vessel Segmentation

arxiv.org · 5 Jun · #machine-learning

Deep Learning-assisted AMD Staging based on OCT and OCT Angiography

arxiv.org · 5 Jun · #machine-learning

Disentangled Fine-Grained Prototype Learning for Incomplete Image-Tabular Classification

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required