arXiv:2606.25087v1 Announce Type: new Abstract: Neural network quantization aims to find a discrete representation of parameters that preserves the performance of a full-precision (FP) model as faithfully as possible. Enforcing discrete constraints perturbs parameters away from a well-optimized minimum, generally resulting in performance degradation. Recent studies indicate that low-loss FP solutions are not isolated, but instead belong to connected low-loss subspaces of the loss landscape, where the loss maintains nearly the same minimum value. Models sampled from these subspaces are diverse and retain high accuracy. This raises the question: can a quantized model be constructed to lie within a low-loss subspace of the FP model, thereby automatically preserving performance? We address this question by learning quantization-aware linear paths in weight space optimized to minimize loss. We demonstrate that the midpoint of the resulting subspace is, by design, quantization-friendly and that its direct quantization yields performance comparable to that of quantization-aware training. The proposed procedure offers a novel perspective on weight quantization and, in contrast to conventional methods, neither relies on the straight-through estimator nor involves explicit discretization during training.
Graph-Based Phonetic Error Correction of Noisy ASR