16:00
2026-06-26
developer.nvidia.com
large-language-models
Creating the NVIDIA Nemotron 3 Ultra NVFP4 Checkpoint with NVIDIA Model Optimizer
NVIDIA released the Nemotron 3 Ultra NVFP4 checkpoint, a quantized model that achieves up to 5.9x higher inference throughput than GLM-5.1 754B FP4 on decode-heavy workloads while matching BF16 accuraβ¦