For practitioners: rack-scale co-design is shifting performance trade-offs from per-GPU density to sustained, low-latency cross-GPU fabrics and rack-level services, which matters for deploying always-on agentic AI and very long-context models. CoreWeave wrote on June 17, 2026 that it was the first cloud provider to bring up and validate NVIDIA's NVL72 Vera Rubin rack, and SiliconANGLE quoted CoreWeave EVP Chen Goldberg describing the platform as "not an incremental upgrade," citing 72 Rubin GPUs, 36 Vera CPUs, and 260 TB/s of NVLink 6 fabric inside a single rack. NVIDIA's technical blog and press release (May 31, 2026) describe the broader Vera Rubin POD and MGX rack architecture, reporting POD-scale numbers such as 1,152 Rubin GPUs, 60 exaflops, and 10 PB/s bandwidth across five rack-scale systems.
OpenAI slashes inference costs by over 50% with Nvidia GPU efficiency: The Information