InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

wpnews.pro

cd /news/machine-learning/infoquant-shaping-activation-distrib… · home › topics › machine-learning › article

[ARTICLE · art-14885] src=arxiv.org ↗ pub=2026-05-27T04:00Z topic=machine-learning verified=true sentiment=↑ positive

InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization

Researchers have developed InfoQuant, a training-free method that reshapes activation distributions in large language models to improve low-bit quantization efficiency. The approach, which uses Peak Suppression Orthogonal Transformation and adaptive outlier-token selection, preserves 97% of floating-point accuracy under W4A4KV4 quantization and reduces the LLaMA-2 13B performance gap by 42% over prior state-of-the-art methods. This advance addresses a key bottleneck in deploying LLMs with reduced memory and computational requirements.

read1 min views8 publishedMay 27, 2026

arXiv:2605.26175v1 Announce Type: new Abstract: Low-bit activation quantization remains a major bottleneck in efficient large language model (LLM) deployment. The difficulty is not only that activations contain outliers, but that their distributions are often poorly matched to a low-bit uniform quantizer. Existing post-training quantization (PTQ) methods suppress peaks, balance channels, or minimize reconstruction error, yet they rarely specify what activation distribution is actually easy to discretize. As a result, activations may appear numerically smoother while still incurring large quantization error because the quantization range remains wide or most values collapse into a few levels near the mean. We recast activation transformation as quantizer-facing distribution design and analyze quantization error from an information-theoretic perspective. Our analysis shows that quantization-friendly activations should jointly have a smaller numerical range and sufficient dispersion within that range. Guided by this analysis, we propose InfoQuant, a train-free method that employs Peak Suppression Orthogonal Transformation (PSOT) to shape activations into more quantization-friendly distributions. We further introduce adaptive outlier-token selection to improve the robustness of PSOT during optimization. Across multiple LLM families, InfoQuant consistently outperforms prior PTQ and end-to-end training baselines. Under W4A4KV4, it preserves 97% of floating-point accuracy on average and reduces the LLaMA-2 13B performance gap by 42% over the previous state of the art. Code is available at https://github.com/LLIKKE/InfoQuant

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/infoquant-shaping-activa…

Read original on arxiv.org → arxiv.org/abs/2605.26175

mentioned entities

InfoQuant

Peak Suppression Orthogonal Transformation

PSOT

LLM

metadata

sluginfoquant-shaping-activation-distributions-for-low-bit-llm-quantization

topic#machine-learning

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevSejong University launches Asia’…

next →European AI adoption hits 99% wi…

── more in #machine-learning 4 stories · sorted by recency

byteiota.com · 16 Jul · #machine-learning

vLLM v0.25: Model Runner V2 Default, PagedAttention Gone

huggingface.co · 16 Jul · #machine-learning

NVIDIA Nemotron 3 Embed Ranks #1 Overall on RTEB, Advancing Agentic Retrieval

kimi.com · 16 Jul · #machine-learning

Kimi K3: Open Frontier Intelligence

marktechpost.com · 16 Jul · #machine-learning

OpenAI Details GPT-Red: An Internal Automated Red-Teaming Model That Beat Human Red-Teamers 84% To 13% On Prompt Injection

── more on @infoquant 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required