{"slug": "pruning-deep-neural-networks-via-the-marchenko-pastur-distribution", "title": "Pruning Deep Neural Networks via the Marchenko--Pastur Distribution", "summary": "Researchers have developed a Marchenko-Pastur (MP) random-matrix approach for pruning deep neural networks that requires minimal post-pruning fine-tuning. The method provides deterministic data-path certificates for accuracy retention, demonstrating that if a removed component has a small propagated logit effect, pruning preserves samples whose dense margin exceeds twice the perturbation. On ImageNet-1k, the technique achieved ViT-B/16 top-1 accuracy of 83.41% with 59.81% sparse-execution MAC reduction after only three distillation epochs, and delivered up to 2.705x backend speedup on A100 hardware.", "body_md": "arXiv:2606.02608v1 Announce Type: new\nAbstract: We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and fine-tuning schedules, rather than a long post-pruning reoptimization pipeline. The theory gives deterministic data-path certificates: if the removed component $R$ has small propagated logit effect $L_s \\| R \\psi_1(s) \\|_\\infty$, pruning decreases an elastic-net objective and preserves samples whose dense margin exceeds twice the perturbation. The zero-budget case gives perfect pruning; a prune--restore extension models weight restoration inside a fixed sparse-execution pattern; and an additive $L_2$-regularized model shows admissible random-like components vanish at the training limit, with persistent spikes stabilizing as the MP bulk collapses. Under iid-Gaussian sufficient conditions, the fitted MP edge $\\sigma_+$ gives a high-probability layerwise budget signal.\nOn ImageNet-1k, after only three distillation epochs, ViT-B/16 $2{:}4{+}$ToMe reaches $83.41\\%$ top-1 ($-1.70$ pp from dense) at $59.81\\%$ sparse-execution MAC reduction, with $1.388\\times$ best-observed A40 native-$2{:}4$ backend speedup for the same checkpoint and ToMe graph; a separate no-ToMe A100 endpoint gives $2.705\\times$. At structured sparsity, ViT-B/16 $6{:}12$ reaches $83.74\\%$, ViT-L/16 $8{:}16$ dense+permutation reaches $85.33\\%$ ($-0.51$ pp), and ConvNeXtV2-Base $12{:}16$ reaches $86.35\\%$ ($-0.37$ pp). For CNNs, ResNet50 $8{:}16$ dense+permutation reaches $75.87\\%$ ($-0.26$ pp), and ResNet152d CAST-conv+permutation reaches $81.33\\%$ ($-1.53$ pp) at ${\\sim}50\\%$ MAC accounting with a $1.62\\times$ A40 im2col$+2{:}4$ sparse-GEMM audit.", "url": "https://wpnews.pro/news/pruning-deep-neural-networks-via-the-marchenko-pastur-distribution", "canonical_source": "https://arxiv.org/abs/2606.02608", "published_at": "2026-06-03 04:00:00+00:00", "updated_at": "2026-06-03 04:03:31.843380+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "computer-vision", "ai-research"], "entities": ["ViT-B/16", "ViT-L/16", "ImageNet-1k", "A40", "A100", "ToMe"], "alternates": {"html": "https://wpnews.pro/news/pruning-deep-neural-networks-via-the-marchenko-pastur-distribution", "markdown": "https://wpnews.pro/news/pruning-deep-neural-networks-via-the-marchenko-pastur-distribution.md", "text": "https://wpnews.pro/news/pruning-deep-neural-networks-via-the-marchenko-pastur-distribution.txt", "jsonld": "https://wpnews.pro/news/pruning-deep-neural-networks-via-the-marchenko-pastur-distribution.jsonld"}}