cd /news/machine-learning/infoquant-shaping-activation-distrib… · home topics machine-learning article
[ARTICLE · art-14885] src=arxiv.org pub= topic=machine-learning verified=true sentiment=↑ positive

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

Researchers have developed InfoQuant, a training-free method that reshapes activation distributions in large language models to improve low-bit quantization efficiency. The approach, which uses Peak Suppression Orthogonal Transformation and adaptive outlier-token selection, preserves 97% of floating-point accuracy under W4A4KV4 quantization and reduces the LLaMA-2 13B performance gap by 42% over prior state-of-the-art methods. This advance addresses a key bottleneck in deploying LLMs with reduced memory and computational requirements.

read1 min publishedMay 27, 2026

arXiv:2605.26175v1 Announce Type: new Abstract: Low-bit activation quantization remains a major bottleneck in efficient large language model (LLM) deployment. The difficulty is not only that activations contain outliers, but that their distributions are often poorly matched to a low-bit uniform quantizer. Existing post-training quantization (PTQ) methods suppress peaks, balance channels, or minimize reconstruction error, yet they rarely specify what activation distribution is actually easy to discretize. As a result, activations may appear numerically smoother while still incurring large quantization error because the quantization range remains wide or most values collapse into a few levels near the mean. We recast activation transformation as quantizer-facing distribution design and analyze quantization error from an information-theoretic perspective. Our analysis shows that quantization-friendly activations should jointly have a smaller numerical range and sufficient dispersion within that range. Guided by this analysis, we propose InfoQuant, a train-free method that employs Peak Suppression Orthogonal Transformation (PSOT) to shape activations into more quantization-friendly distributions. We further introduce adaptive outlier-token selection to improve the robustness of PSOT during optimization. Across multiple LLM families, InfoQuant consistently outperforms prior PTQ and end-to-end training baselines. Under W4A4KV4, it preserves 97% of floating-point accuracy on average and reduces the LLaMA-2 13B performance gap by 42% over the previous state of the art. Code is available at https://github.com/LLIKKE/InfoQuant

── more in #machine-learning 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/infoquant-shaping-ac…] indexed:0 read:1min 2026-05-27 ·