{"slug": "high-probability-pl-sgd-with-markovian-noise-optimal-mixing-and-tail-dependence", "title": "High-Probability PL-SGD with Markovian Noise: Optimal Mixing and Tail Dependence", "summary": "Researchers closed a gap in high-probability bounds for stochastic gradient descent under the Polyak-Łojasiewicz condition with Markovian noise, proving optimal linear dependence on mixing time. They also extended the framework to heavy-tailed gradients with a clipped block method, achieving matching lower bounds. The work tightly characterizes optimal mixing-time and tail-exponent dependencies for PL-SGD.", "body_md": "arXiv:2606.26316v1 Announce Type: new\nAbstract: We study first-order methods for smooth objectives satisfying the Polyak-\\L{}ojasiewicz (PL) condition when gradient samples are generated by an exogenous Markov chain. In the light-tailed setting, prior uniform-in-time high-probability bounds for ordinary Stochastic Gradient Descent (SGD) under a standard growth envelope scale as $\\widetilde{O}(t_{mix}^2/k)$, leaving a gap with the $\\widetilde{O}(t_{mix}/k)$ expectation bounds. We close this gap using a lag-blocking argument to establish a uniform high-probability guarantee with a leading stochastic term of $\\widetilde{O}(t_{mix}/(k+K_0))$ under geometric mixing. We prove this linear dependence on the mixing time is optimal via a matching $\\Omega(\\sigma^2 t_{mix}/k)$ lower bound on a quadratic objective driven by a persistent two-state chain.\nWe then extend this framework to heavy-tailed Markovian gradients satisfying a stationary finite-$p$-moment condition, $p \\in (1,2]$. We design an all-samples clipped block method that uses every Markov transition while mitigating Markovian bias. Under a transition budget $T$, this algorithm achieves a high-probability stochastic error of $\\widetilde{O}(\\sigma_p^2(t_{mix}/T)^{2(p-1)/p})$. We establish a matching lower bound by reducing PL optimization to heavy-tailed mean estimation for a sticky Markov chain. Ultimately, this work tightly characterizes the optimal polynomial dependence on mixing time for light-tailed PL-SGD, and the optimal heavy-tail exponent and effective-sample-size dependence in the robust regime.", "url": "https://wpnews.pro/news/high-probability-pl-sgd-with-markovian-noise-optimal-mixing-and-tail-dependence", "canonical_source": "https://arxiv.org/abs/2606.26316", "published_at": "2026-06-26 04:00:00+00:00", "updated_at": "2026-06-26 04:18:06.872182+00:00", "lang": "en", "topics": ["machine-learning", "ai-research"], "entities": ["arXiv", "Polyak-Łojasiewicz", "SGD", "Markov chain"], "alternates": {"html": "https://wpnews.pro/news/high-probability-pl-sgd-with-markovian-noise-optimal-mixing-and-tail-dependence", "markdown": "https://wpnews.pro/news/high-probability-pl-sgd-with-markovian-noise-optimal-mixing-and-tail-dependence.md", "text": "https://wpnews.pro/news/high-probability-pl-sgd-with-markovian-noise-optimal-mixing-and-tail-dependence.txt", "jsonld": "https://wpnews.pro/news/high-probability-pl-sgd-with-markovian-noise-optimal-mixing-and-tail-dependence.jsonld"}}