{"slug": "infoquant-shaping-activation-distributions-for-low-bit-llm-quantization", "title": "InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization", "summary": "Researchers have developed InfoQuant, a training-free method that reshapes activation distributions in large language models to improve low-bit quantization efficiency. The approach, which uses Peak Suppression Orthogonal Transformation and adaptive outlier-token selection, preserves 97% of floating-point accuracy under W4A4KV4 quantization and reduces the LLaMA-2 13B performance gap by 42% over prior state-of-the-art methods. This advance addresses a key bottleneck in deploying LLMs with reduced memory and computational requirements.", "body_md": "arXiv:2605.26175v1 Announce Type: new\nAbstract: Low-bit activation quantization remains a major bottleneck in efficient large language model (LLM) deployment. The difficulty is not only that activations contain outliers, but that their distributions are often poorly matched to a low-bit uniform quantizer. Existing post-training quantization (PTQ) methods suppress peaks, balance channels, or minimize reconstruction error, yet they rarely specify what activation distribution is actually easy to discretize. As a result, activations may appear numerically smoother while still incurring large quantization error because the quantization range remains wide or most values collapse into a few levels near the mean. We recast activation transformation as quantizer-facing distribution design and analyze quantization error from an information-theoretic perspective. Our analysis shows that quantization-friendly activations should jointly have a smaller numerical range and sufficient dispersion within that range. Guided by this analysis, we propose InfoQuant, a train-free method that employs Peak Suppression Orthogonal Transformation (PSOT) to shape activations into more quantization-friendly distributions. We further introduce adaptive outlier-token selection to improve the robustness of PSOT during optimization. Across multiple LLM families, InfoQuant consistently outperforms prior PTQ and end-to-end training baselines. Under W4A4KV4, it preserves 97% of floating-point accuracy on average and reduces the LLaMA-2 13B performance gap by 42% over the previous state of the art. Code is available at [https://github.com/LLIKKE/InfoQuant](https://github.com/LLIKKE/InfoQuant)", "url": "https://wpnews.pro/news/infoquant-shaping-activation-distributions-for-low-bit-llm-quantization", "canonical_source": "https://arxiv.org/abs/2605.26175", "published_at": "2026-05-27 04:00:00+00:00", "updated_at": "2026-05-27 04:29:03.520345+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "neural-networks", "artificial-intelligence", "ai-research"], "entities": ["InfoQuant", "Peak Suppression Orthogonal Transformation", "PSOT", "LLM"], "alternates": {"html": "https://wpnews.pro/news/infoquant-shaping-activation-distributions-for-low-bit-llm-quantization", "markdown": "https://wpnews.pro/news/infoquant-shaping-activation-distributions-for-low-bit-llm-quantization.md", "text": "https://wpnews.pro/news/infoquant-shaping-activation-distributions-for-low-bit-llm-quantization.txt", "jsonld": "https://wpnews.pro/news/infoquant-shaping-activation-distributions-for-low-bit-llm-quantization.jsonld"}}