{"slug": "liftquant-continuous-bit-width-llm-via-dimensional-lifting-and-projection", "title": "LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection", "summary": "Researchers have developed LiftQuant, a framework enabling continuous bit-width control for large language models by using a \"lift-then-project\" mechanism that approximates low-dimensional weight vectors from a higher-dimensional space. The method allows a 70-billion-parameter LLM to be compressed to 2.4 bits to precisely fit a 24GB GPU, outperforming existing 2-bit models on the same hardware. This breakthrough eliminates the rigid integer-based bit-width constraints that previously prevented optimal model deployment under specific memory budgets.", "body_md": "arXiv:2606.04050v1 Announce Type: new\nAbstract: Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap\" where Large Language Models cannot be optimally fitted to specific memory budgets. To bridge this gap, we introduce LiftQuant, a novel framework that enables continuous bit-width control for true Pareto-optimal deployment. The core innovation is a ``lift-then-project\" mechanism which approximates low-dimensional weight vectors by projecting a simple 1-bit lattice from a higher-dimensional ``lifted\" space. Crucially, the effective bit-width is determined simply by the ratio of the lifted dimension to the original dimension, which allows the bit-width to be tuned quasi-continuous as the dimension is a flexible structural parameter. This projection generates a structured yet non-uniform codebook, capturing the expressive power of Vector Quantization (VQ). While beneficial over VQ, LiftQuant's decoding path relies solely on linear transformations and 1-bit uniform quantizers, retaining hardware-friendly nature. This flexibility is transformative: LiftQuant enables a 70B LLM to be compressed to 2.4 bits to precisely fit a 24GB GPU, where its performance significantly surpasses state-of-the-art 2-bit models fitted on the same device. Our code and ckpt is available at https://github.com/Heliulu/LiftQuant.", "url": "https://wpnews.pro/news/liftquant-continuous-bit-width-llm-via-dimensional-lifting-and-projection", "canonical_source": "https://arxiv.org/abs/2606.04050", "published_at": "2026-06-04 04:00:00+00:00", "updated_at": "2026-06-04 04:36:54.819839+00:00", "lang": "en", "topics": ["large-language-models", "machine-learning", "artificial-intelligence", "neural-networks", "ai-research"], "entities": ["LiftQuant", "arXiv", "Heliulu"], "alternates": {"html": "https://wpnews.pro/news/liftquant-continuous-bit-width-llm-via-dimensional-lifting-and-projection", "markdown": "https://wpnews.pro/news/liftquant-continuous-bit-width-llm-via-dimensional-lifting-and-projection.md", "text": "https://wpnews.pro/news/liftquant-continuous-bit-width-llm-via-dimensional-lifting-and-projection.txt", "jsonld": "https://wpnews.pro/news/liftquant-continuous-bit-width-llm-via-dimensional-lifting-and-projection.jsonld"}}