{"slug": "spectral-dpps-via-nepv-a-scalable-continuous-relaxation-of-determinantal-map-for", "title": "Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection", "summary": "Researchers introduced a continuous relaxation of the determinantal point process (DPP) maximum a posteriori (MAP) problem for diversity-aware data selection, reformulating it as a nonlinear eigenvalue problem (NEPv) on the Stiefel manifold. The resulting algorithm, Spectral DPPs via NEPv (SDvN), runs in near-linear time relative to the ground-set size, enabling scalable subset selection from millions to billions of candidates for applications like data curation, active learning, and retrieval diversification.", "body_md": "arXiv:2606.19411v1 Announce Type: new\nAbstract: Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning batch acquisition, prompt and exemplar selection for in-context learning, retrieval diversification, and experimental design. Determinantal Point Processes (\\DPP s) give a principled, well-calibrated notion of diversity for this task, but their \\emph{MAP} objective -- pick a size-$k$ subset $S$ maximizing $\\logdet(L_S)$ -- is NP-hard, and the standard greedy and sampling algorithms scale superlinearly in the ground-set size $n$. This cost is prohibitive precisely in the data-centric regime where diversity matters most, where $n$ ranges over millions to billions of candidate examples, features, or embeddings. We recast \\DPP-MAP as a continuous optimization problem over the Stiefel manifold, and show that its first-order optimality conditions form a \\emph{Nonlinear Eigenvalue Problem with eigenvector dependency} (\\NEPv) of a previously unstudied form. This \\NEPv\\ admits a self-consistent field (\\SCF) iteration with a spectral-gap-based local contraction guarantee, giving a principled iterative solver where the diversity objective drives an eigenvector-dependent operator. The resulting algorithm, \\OurMethod, requires only matrix-vector products with the kernel and runs in time $O\\!\\big((ndk+nk^2)\\,t\\big)$ for a small number of iterations $t$, scaling near-linearly in $n$ and integrating directly with low-rank and feature-map kernels common in ML. This paper focuses on the relaxation, solver, and scaling analysis; full real-data benchmarking is left to a planned empirical study.", "url": "https://wpnews.pro/news/spectral-dpps-via-nepv-a-scalable-continuous-relaxation-of-determinantal-map-for", "canonical_source": "https://arxiv.org/abs/2606.19411", "published_at": "2026-06-19 04:00:00+00:00", "updated_at": "2026-06-19 04:09:34.261695+00:00", "lang": "en", "topics": ["machine-learning", "ai-research", "ai-infrastructure"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/spectral-dpps-via-nepv-a-scalable-continuous-relaxation-of-determinantal-map-for", "markdown": "https://wpnews.pro/news/spectral-dpps-via-nepv-a-scalable-continuous-relaxation-of-determinantal-map-for.md", "text": "https://wpnews.pro/news/spectral-dpps-via-nepv-a-scalable-continuous-relaxation-of-determinantal-map-for.txt", "jsonld": "https://wpnews.pro/news/spectral-dpps-via-nepv-a-scalable-continuous-relaxation-of-determinantal-map-for.jsonld"}}