Qumulo intros Cloud AI Accelerator with Cisco to create GPU liquidity

wpnews.pro

cd /news/ai-infrastructure/qumulo-intros-cloud-ai-accelerator-w… · home › topics › ai-infrastructure › article

[ARTICLE · art-14562] src=blocksandfiles.com ↗ pub=2026-05-26T14:59Z topic=ai-infrastructure verified=true sentiment=· neutral

Qumulo intros Cloud AI Accelerator with Cisco to create GPU liquidity

Qumulo launched its Cloud AI Accelerator with Cisco to address what it calls a GPU liquidity crisis, enabling enterprises to deliver data from distributed on-premises and public cloud sites to GPU accelerators without copying or staging it. The company cited a recent analysis showing average enterprise GPU utilization hovers around 5 percent, meaning billions of dollars in accelerated compute infrastructure sits idle 95 percent of the time due to data staging delays. Qumulo CEO Doug Gourlay said the industry's focus on GPU availability misses the deeper utilization problem caused by data gravity, and the new offering aims to solve it by creating an intelligent data fabric that delivers any enterprise dataset in real time to any GPU farm in any cloud.

read4 min views14 publishedMay 26, 2026

Qumulo says its Cloud AI Accelerator offering gets data from distributed on-prem and public cloud sites to GPU accelerators without it needing to be copied and staged to all-flash stirage closely coupled to the GPU servers.

It tells us that, according to a recent analysis, the average enterprise GPU utilization hovers around a staggering 5 percent. This means hundreds of billions of dollars’ worth of accelerated compute infrastructure sits idle roughly 95 percent of the time because data must be staged, replicated, and moved into position before a workload can even start. Improved tokenomics has to consider total creation time, not just the last mile.

Qumulo CEO Doug Gourlay said: “Every enterprise we talk to is focused on GPU availability, but availability is only half the problem. The deeper issue is utilization, and the culprit is data gravity.”

“The industry's response has been to sell enterprises more tightly-coupled storage attached directly to GPU clusters, which optimizes a tiny window of active compute time while doing nothing about the idle time that surrounds it. This only leads to more expensive tokens and storage islands to maintain. Cloud AI Accelerator was built to solve the actual problem of getting the data to the GPUs instantly, wherever they are, without ever copying it.”

The company says that its Cloud AI Accelerator creates GPU liquidity by building an intelligent data fabric that integrates its Cloud Native Qumulo (CNQ), Cloud Data Fabric, and NeuralCache offerings across on-premises, edge, and multi-cloud environments.

This allows enterprises to run workloads wherever GPU capacity is available, rather than, it says, from wherever data happens to be trapped.

Qumulo’s Cloud Native Qumulo (CNQ) is Qumulo’s CDP running natively in AWS, Azure, the Google Cloud Platform, and Oracle Cloud Infrastructure. Cloud Data Platform (CDP) is its scale-out and clustered filesystem software running on-premises. The company announced its Cloud Data Fabric (CDF) in February last year, and it has a central file and object data core repository with coherent caches at the edge. The core repository is a distributed file and object data storage cluster that runs on most systems, vendors’ server hardware, or public cloud infrastructures.

NeuralCache predictive caching was added to the Cloud Data Fabric in April 2025, and uses AI and machine learning models to dynamically optimize read/write caching,

The company actually introduced its Cloud AI Accelerator last November. It is a way of moving data from Qumulo Cloud Data Fabric stores to a GPU server, using NeuralCache technology to predictively cache and reduce GPU data load times by up to 64 percent.

Now it says that the Cloud AI Accelerator’s AI-focused data fabric makes providing data to GPUs “a flexible scheduling operation, delivering any enterprise dataset in real time to any GPU farm in any cloud.” Enterprise customers can;

Connect Without Copying: Seamlessly and securely connect on-premises or cloud-native Qumulo systems to Microsoft AI Foundry, AWS Bedrock, and Google Vertex AI without copying data.
Capture Global GPU Capacity: Run AI workloads wherever and whenever GPU capacity becomes available, across any region, cloud, or availability zone.
Eliminate Staging Delays: Wipe out the weeks-long data-staging delays that keep GPU infrastructure idle before training or inference workloads begin.
Eradicate Storage Islands: Avoid maintaining multiple, isolated, and replicated storage silos across every environment where GPUs might be sourced.
Slash Idle Compute Costs: Drastically reduce idle GPU costs by eliminating the heavy load phase into GPU-attached flash storage.

Qumulo emphasizes that its Cloud AI Accelerator drastically reduces idle GPU costs by eliminating the heavy load phase into GPU-attached flash storage. We understand that, with Qumulo, data streams at block level from source sites; edge/data center on-prem, cloud, cross-region, or CNQ S3-backed, to the Accelerator's cache (CPU DRAM), then directly to GPUs. Cloud AI Accelerator shrinks overall AI training and inference tine, Qumulo says, not just the last few inches of data movement from a previously loaded all-flash box tightly-linked to a GPU server. It, in effect, enables GPU resources to flow, as it were, to become available to data wherever it is; GPU liquidity.

The company does not directly support Nvidia’s STX reference architecture and its KV caching scheme; what Gourlay might calls “a tiny window of active compute time.” Such support would entail Qumulo’s CDP running on Nvidia’s BlueField 4 DPUs and supporting the relevant Nvidia software services, such as Dynamo.

Qumulo’s Cloud AI Accelerator has Cisco networking and security linking and safeguarding its CDP sites. Together, Cisco and Qumulo “enable enterprises to build agile AI infrastructure that adapts in minutes to changing GPU availability, providing the operational flexibility that makes GPU liquidity achievable at enterprise scale.”

The Cloud AI Accelerator is available now across AWS, Azure, Google Cloud, and Oracle Cloud Infrastructure (OCI), with hybrid deployment support for Cisco UCS on-premises environments.

Qumulo will be present at the Cisco Live 2026 event in Las Vegas, booth #4018, May 31 - June 4.

Comment

Our understanding is that, through its partnership, with Nivida, Cisco offers its AI PODs as part of the Secure AI Factory with Nvidia. These are pre-validated, modular designs using Cisco UCS servers, Nexus networking, and third-party storage.

These AI PODS are different from Nvidia’s BasePODs and SuperPODs, as they are based on Cisco's own validated designs (CVDs) rather than Nvidia specifications - which Dell and Supermicro use.

source & further reading

blocksandfiles.com — original article Storage news ticker - 9 July 2026 Diskover finds and eliminates ROT data DDN gets Infinia ready for production AI inferencing

~/api · this article 200

$curl api.wpnews.pro/v1/news/qumulo-intros-cloud-ai-a…

Read original on blocksandfiles.com → www.blocksandfiles.com/ai-ml/2026/05/26/qumulo-i…

mentioned entities

Qumulo

Cisco

Doug Gourlay

metadata

slugqumulo-intros-cloud-ai-accelerator-with-cisco-to-create-gpu-liquidity

topic#ai-infrastructure

secondary3 topics

sentimentneutral

canonicalblocksandfiles.com

navigation

← prevLaunch HN: Minicor (YC P26) – Wi…

next →Keeping GPU Workloads NUMA-Local…

── more in #ai-infrastructure 4 stories · sorted by recency

greenflagdigital.com · 10 Jul · #ai-infrastructure

Together AI, Apps Flyer lead list of Top dynamic companies for Q3 2026

tokenstead.ai · 10 Jul · #ai-infrastructure

DeepSeek V4 Pro

machinebrief.com · 10 Jul · #ai-infrastructure

Spectral Space: The SAR Method in AI

dev.to · 10 Jul · #ai-infrastructure

Stop Guessing: Real Data Comparing Chinese and US AI Models

── more on @qumulo 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required