LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection

wpnews.pro

cd /news/large-language-models/liftquant-continuous-bit-width-llm-v… · home › topics › large-language-models › article

[ARTICLE · art-21159] src=arxiv.org pub=2026-06-04T04:00Z topic=large-language-models verified=true sentiment=↑ positive

LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection

Researchers have developed LiftQuant, a framework enabling continuous bit-width control for large language models by using a "lift-then-project" mechanism that approximates low-dimensional weight vectors from a higher-dimensional space. The method allows a 70-billion-parameter LLM to be compressed to 2.4 bits to precisely fit a 24GB GPU, outperforming existing 2-bit models on the same hardware. This breakthrough eliminates the rigid integer-based bit-width constraints that previously prevented optimal model deployment under specific memory budgets.

read1 min publishedJun 4, 2026

arXiv:2606.04050v1 Announce Type: new Abstract: Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a deployment gap" where Large Language Models cannot be optimally fitted to specific memory budgets. To bridge this gap, we introduce LiftQuant, a novel framework that enables continuous bit-width control for true Pareto-optimal deployment. The core innovation is a lift-then-project" mechanism which approximates low-dimensional weight vectors by projecting a simple 1-bit lattice from a higher-dimensional ``lifted" space. Crucially, the effective bit-width is determined simply by the ratio of the lifted dimension to the original dimension, which allows the bit-width to be tuned quasi-continuous as the dimension is a flexible structural parameter. This projection generates a structured yet non-uniform codebook, capturing the expressive power of Vector Quantization (VQ). While beneficial over VQ, LiftQuant's decoding path relies solely on linear transformations and 1-bit uniform quantizers, retaining hardware-friendly nature. This flexibility is transformative: LiftQuant enables a 70B LLM to be compressed to 2.4 bits to precisely fit a 24GB GPU, where its performance significantly surpasses state-of-the-art 2-bit models fitted on the same device. Our code and ckpt is available at https://github.com/Heliulu/LiftQuant.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/liftquant-continuous-bit…

Read original on arxiv.org → arxiv.org/abs/2606.04050

mentioned entities

LiftQuant

arXiv

Heliulu

metadata

slugliftquant-continuous-bit-width-llm-via-dimensional-lifting-and-projection

topic#large-language-models

secondary4 topics

sentimentpositive

langen

canonicalarxiv.org

navigation

← prevHow FinOps Teams Trace Per-Reque…

next →SharkFlow Legal — devto

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 4 Jun · #large-language-models

Parameter-Efficient Fine-Tuning with Learnable Rank

arxiv.org · 4 Jun · #large-language-models

Can Generalist Agents Automate Data Curation?

arxiv.org · 4 Jun · #large-language-models

Do Transformers Need Three Projections? Systematic Study of QKV Variants

arxiv.org · 4 Jun · #large-language-models

Self-Distilled Policy Gradient

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required