{"slug": "actquant-sub-4-bit-action-guided-quantization-for-vision-language-action-models", "title": "ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models", "summary": "Researchers introduced ActQuant, a sub-4-bit quantization framework for Vision-Language-Action (VLA) models that reduces computational demands for edge deployment. The method achieved 95% performance retention at 3 bits-per-weight on the LIBERO benchmark and compressed the OpenVLA-OFT backbone from 14.3 GB to 2.7 GB (5.3x) at 2.5 bits-per-weight. On a physical UR3 robotic arm, ActQuant maintained baseline success rates while reducing memory footprint by 2.5x.", "body_md": "arXiv:2605.24011v1 Announce Type: new\nAbstract: Vision-Language-Action (VLA) models exhibit remarkable action generation for embodied intelligence, but their heavy compute make deployment on edge platforms impractical. Aggressive, sub-4-bit weight quantization is the natural solution, yet existing post-training quantization (PTQ) methods suffer severe performance degradation in this regime. To address this, we introduce ActQuant, an action-guided mixed-precision PTQ framework that operates in two stages: (1) an inter-tensor bit allocator that assigns each weight matrix a single bit-width based on how much it contributes to predicting the agent's actions; (2) an intra-tensor scale optimizer tunes per-block quantization scales using action-aware curvature, so that dynamic range is concentrated on the weights most influential for control. To deliver the on-device benefits of our aggressive quantization, we further introduce OmniModel.cpp, an agentic conversion pipeline that ports architectures into a native C/C++ runtime with efficient low-bit kernels. We evaluate ActQuant both in simulation and on a real-world 6-DoF UR3 arm, with all models deployed through OmniModel.cpp. On the LIBERO benchmark, ActQuant is the only method that operates at or below 3 bits-per-weight, retaining 95.0% on OpenVLA-OFT and 94.8% on $\\pi_{0.5}$. Pushed further, ActQuant reaches 2.5 bpw at 90.1% on OpenVLA-OFT, compressing the backbone from 14.3 GB to 2.7 GB (5.3$\\times$). On the physical UR3 arm, $\\pi_{0.5}$ quantized with ActQuant retains the baseline's success rate while reducing the memory footprint by 2.5$\\times$.", "url": "https://wpnews.pro/news/actquant-sub-4-bit-action-guided-quantization-for-vision-language-action-models", "canonical_source": "https://arxiv.org/abs/2605.24011", "published_at": "2026-05-26 04:00:00+00:00", "updated_at": "2026-05-26 04:11:33.856066+00:00", "lang": "en", "topics": ["machine-learning", "robotics", "artificial-intelligence", "neural-networks", "ai-infrastructure"], "entities": ["ActQuant", "OmniModel.cpp", "OpenVLA-OFT", "π0.5", "LIBERO", "UR3"], "alternates": {"html": "https://wpnews.pro/news/actquant-sub-4-bit-action-guided-quantization-for-vision-language-action-models", "markdown": "https://wpnews.pro/news/actquant-sub-4-bit-action-guided-quantization-for-vision-language-action-models.md", "text": "https://wpnews.pro/news/actquant-sub-4-bit-action-guided-quantization-for-vision-language-action-models.txt", "jsonld": "https://wpnews.pro/news/actquant-sub-4-bit-action-guided-quantization-for-vision-language-action-models.jsonld"}}