Qumulo and Nvidia support

Qumulo is declining to support Nvidia's STX KV caching scheme, arguing the hardware-locked optimization only benefits on-premise clusters and ignores the 70 percent of AI workloads running in public clouds where STX is unavailable. The company's Cloud AI Accelerator instead targets the "first 100 miles" of data staging, which Qumulo says causes 95 percent of GPU compute to sit idle due to data gravity and scarcity issues.

Qumulo and Nvidia support Virtually all filesystem storage suppliers are supporting Nvidia's STX KV Caching https://www.blocksandfiles.com/ai-ml/2026/03/30/nvidia-and-its-partners-kv-cache-extenders/5209284 scheme - yet Qumulo is not, saying https://www.blocksandfiles.com/ai-ml/2026/05/26/qumulo-intros-cloud-ai-accelerator-with-cisco-to-create-gpu-liquidity/5246356 this last mile optimization is not fixing the overall delays in AI Training/Infrencing that come from having to stage information to tightly-coupled, all-flash storage at the GPU server. The Cloud AI Accelerator focusses on preventing the staging delay. Why doesn't Qumulo also support Nvidia's STX KV Caching scheme and get the best of both worlds? We put this point to Brandon Whitelaw, Qumulo SVP and Head of Product and he made a good case. Here’s what he had to say: “It is true that Nvidia’s STX reference architecture and Key-Value KV caching schemes are brilliant engineering solutions. However, building an AI data strategy solely around STX right now ignores the fundamental reality of where enterprise AI is actually happening. “When you look at the total global compute capacity, factoring in Nvidia hardware as well as proprietary silicon like Google’s TPUs and AWS Trainium, roughly 70 percent of all AI workloads are running in the Big Four public clouds AWS, Azure, GCP, and OCI . Qumulo’s product roadmap is dictated by this reality and by our customers, who currently have over 1 Exabyte EB of data deployed for AI workloads, and our zero-copy integration with AI-as-a-S platforms like; Microsoft AI Foundry, AWS Bedrock, and Google Vertex AI, is the fastest growing use case. “Here is why Qumulo is focused on the Cloud AI Accelerator today, and why STX support is a future roadmap item rather than a Day 1 priority: “1. Nvidia STX is Not Available in the Public Cloud: “ Nvidia STX https://www.blocksandfiles.com/ai-ml/2026/03/30/nvidia-and-its-partners-kv-cache-extenders/5209284 relies on highly specific, tightly coupled hardware configurations, specifically requiring BlueField-4 DPUs, Spectrum-X networking, and dedicated CMX flash tiers. “Currently, you cannot spin up an STX-backed environment in AWS, Azure, GCP, or OCI. The major hyperscalers rely on their own custom infrastructure acceleration like AWS Nitro or Azure Boost and take quarters or years to adopt off-the-shelf reference architectures from hardware vendors, if ever. Because the large majority of our enterprise customers are leveraging the public cloud to scale their AI pipelines, prioritizing a hardware-locked optimization that only works in on-premise clusters or specialized neo-clouds serves only a minority of the market. “2. The Real Bottleneck is Data Gravity, Not Just KV Cache: “Nvidia’s KV caching scheme is a fantastic "last-mile" optimization. It makes GPUs more efficient while they are actively generating tokens. But it ignores the "first 100 miles" of the journey; getting the data to the GPUs in the first place. “In the public cloud, GPU scarcity and data gravity are the actual limiters of AI velocity. Currently, hundreds of billions of dollars' worth of accelerated compute infrastructure sits idle roughly 95% of the time because massive, petabyte-scale datasets must be staged, copied, and moved into position before a workload can even begin. “Other storage vendors are building tightly coupled, all-flash storage islands attached directly to local GPU servers to support STX. This solves the active compute window, but it traps your data in an expensive silo and does absolutely nothing to solve the weeks-long staging delays. “3. Qumulo Cloud AI Accelerator: Solving the 95 percent Idle Problem: “Rather than forcing enterprises to move massive datasets to wherever GPUs happen to be, the Qumulo Cloud AI Accelerator https://www.blocksandfiles.com/ai-ml/2026/05/26/qumulo-intros-cloud-ai-accelerator-with-cisco-to-create-gpu-liquidity/5246356 takes a fundamentally different approach. “By integrating Cloud Native Qumulo CNQ and NeuralCache, it presents distributed enterprise data in real-time to GPU resources across any region or cloud without replication or staging delays. This creates true GPU Liquidity. If compute opens up in an AWS region in Europe one day and US East the next, you can point your existing data at those GPUs instantly. We are prioritizing the elimination of the 95 percent idle time and the crushing costs of data logistics, which is what our exabyte-scale customers are demanding. “Qumulo is not opposed to Nvidia's STX architecture, in fact, we will support STX KV caching in the future. As the hardware matures, as BlueField-4 becomes more ubiquitous, and as cloud providers begin adapting these architectures, Qumulo will ensure our platform integrates seamlessly with it. “However, building enterprise AI infrastructure is an exercise in prioritization. Right now, optimizing for the 70 percent of AI workloads happening in the cloud by eliminating massive data-staging delays delivers a far higher, immediate return on investment for our customers than optimizing the final hardware mile of an on-premise rack.“