{"slug": "max-window-scale-estimation-for-near-lossless-hif8-w8a8-quantization-aware", "title": "Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training", "summary": "Researchers at OpenPangu identified two failure modes in HiF8 W8A8 quantization-aware training for the OpenPangu-Embedded-1B model: amax saturation, which corrupts knowledge-sensitive representations through forward-pass clipping, and catastrophic forgetting from aggressive learning rates. The team mitigated these issues using a 64-step max-algorithm DTS strategy and a 500-step BF16 warmup with a learning rate of 10⁻⁵. The final configuration achieved less than 0.6% accuracy drop across MMLU, HellaSwag, and ARC-Challenge benchmarks compared to a BF16 baseline.", "body_md": "arXiv:2605.26189v1 Announce Type: new\nAbstract: Quantization-aware training (QAT) with low-bit floating-point formats enables efficient LLM deployment, yet introduces subtle failure modes invisible to standard training metrics. We present a systematic study of HiF8 W8A8 QAT for OpenPangu-Embedded-1B through the lens of Delayed Tensor Scaling (DTS). Across eight controlled experiments, we identify and disentangle two orthogonal failure modes: (i)amax saturation, where delayed scale estimates silently corrupt knowledge-sensitive representations via forward-pass clipping, and (ii)catastrophic forgetting, where an aggressive learning rate overwrites pretrained commonsense knowledge independently of quantization. Neither is detectable from training loss alone. We address amax saturation with a conservative max-algorithm DTS strategy over a 64-step history window, and mitigate forgetting via a 500-step BF16 warmup followed by QAT at lr=10^{-5}. Both fixes are necessary and sufficient: our final configuration achieves 0.43% MMLU drop, 0.58% HellaSwag drop, and 0.22% ARC-Challenge drop versus a matched BF16 baseline, with a training loss APE of only 0.11% over 10,000 steps.", "url": "https://wpnews.pro/news/max-window-scale-estimation-for-near-lossless-hif8-w8a8-quantization-aware", "canonical_source": "https://arxiv.org/abs/2605.26189", "published_at": "2026-05-27 04:00:00+00:00", "updated_at": "2026-05-27 04:29:16.183817+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "neural-networks", "ai-research"], "entities": ["HiF8", "OpenPangu-Embedded-1B", "Delayed Tensor Scaling", "MMLU", "HellaSwag", "ARC-Challenge", "BF16", "QAT"], "alternates": {"html": "https://wpnews.pro/news/max-window-scale-estimation-for-near-lossless-hif8-w8a8-quantization-aware", "markdown": "https://wpnews.pro/news/max-window-scale-estimation-for-near-lossless-hif8-w8a8-quantization-aware.md", "text": "https://wpnews.pro/news/max-window-scale-estimation-for-near-lossless-hif8-w8a8-quantization-aware.txt", "jsonld": "https://wpnews.pro/news/max-window-scale-estimation-for-near-lossless-hif8-w8a8-quantization-aware.jsonld"}}