# Dell and data physics

> Source: <https://www.blocksandfiles.com/ai-ml/2026/06/22/dell-and-data-physics/5259281>
> Published: 2026-06-22 12:43:09+00:00

# Dell and data physics

Dell says single namespace AI storage architectures are fighting the realities of enterprise distributed data gravity and GPUs will wait for data because of constraints this architecture refuses to acknowledge.

Essentially, does the data come to the platform or does the platform come to the data? Jon Hyde, Dell’s Senior Director for Competitive Intelligence, lays out the case for preferring a federated AI data platform rather than a centralized storage-embedded stack in three blog posts.

In the first [blog](https://www.dell.com/en-us/blog/it-s-not-a-storage-problem-it-s-data-gravity/) post, he argues that mass data has an effective gravity that prevents it being moved from wherever it accumulates; “in core data centers, at the edge, in sovereign regions, in SaaS estates, in warehouses, [and] in object stores owned by business units.” It’s data physics, and he describes “the simplest law of enterprise AI: data lives where it lives, and most of it will never move.”

It won’t move because the enterprise IT estate is full of distortions that prevent a clean data architecture being imposed: regulatory and sovereign constraints, application coupling, competing ownership, contractual and economic friction, M&A churn, compliance-locked archives, unknown or uncatalogued data, and organizational dynamics.

There are, he says, three distinct forms of data:

Data — the heavy form. Files, records, images, video, telemetry, regulated tables. Massive, slow, expensive to move. It stays where it is for a reason.

Metadata — the descriptive form. Tags, lineage, schema, classification, ownership. Lightweight. Cheap to propagate. It lets AI see every asset without traveling to it.

Vectors — the meaning form. Mathematical representations generated by AI. Locality-sensitive, GPU-adjacent. They carry meaning across the estate without carrying the underlying data.

Hyde says “AI doesn’t need the data to travel. It needs the metadata and the vectors. A metadata catalog lets AI see every asset across every system. A vector index lets AI reason about meaning across every environment. The actual heavy data — the regulated, the owned, the latency-bound — stays where it already is, governed by the teams that already govern it.”

The vector data is, he says, locality-sensitive and GPU-adjacent. Our understanding here is that, by implication, either it has already travelled to wherever it was vectorized or it was vectorized in its original location and the vectors moved to a GPU-adjacent location.

Let’s keep this in mind as we follow Hyde's argument.

He is saying, we think, that an enterprise’s data is inherently distributed and that “the two dominant philosophies in the AI data layer diverge” because of this. Here’s what he says:

“Storage-embedded stacks — VAST AI OS being the most visible example — are built on the premise that if you centralize enough data into a vendor-controlled namespace, you can run AI services tightly coupled to it and deliver a simpler operational experience. The idea assumes data will come to the platform because the platform is good enough to justify the move. In a greenfield AI shop, this can be exactly what you want.

“Federated AI data platforms — Dell’s approach — are built on a distinctly different premise: that enterprise data is already distributed and will stay that way, so the platform must meet the data where it lives. Dell’s AI Data Platform pairs [PowerScale](https://www.dell.com/en-us/shop/storage-servers-and-networking-for-business/sf/powerscale?gacd=16229420-1364-5761040-441689965-0&dgc=ST&SA360CID=18034627122&gclsrc=aw.ds&gad_source=1&gad_campaignid=18034627122&gbraid=0AAAAADF0XC0e84-Xh-CCZEYYp3_KAYTog&gclid=Cj0KCQjw_vnQBhCxARIsADcZyxLHPgJl9UrHYk3h8RJ2s3JteBpCTgart8gxp54J8-36INaaTMvXqPsaAvkIEALw_wcB) and [ObjectScale](https://www.dell.com/en-us/shop/storage-servers-and-networking-for-business/sf/objectscale) with a federated control plane designed to access and process data across filesystems, object stores, warehouses, SaaS platforms and public clouds — without requiring a centralized copy first. It treats data, metadata and vectors as the three distinct citizens they are, governs all three coherently and uses metadata and vectors to deliver value while the heavy data stays where it already lives.”

A second [blog](https://www.dell.com/en-us/blog/when-architecture-fights-gravity-operations-pay-the-tax/) sees Hyde saying that storage-embedded stack architectures, like the VAST AI OS, fight the realities of enterprises having distributed data: “Storage-embedded AI stacks, with VAST AI OS being the most visible example, are built on a single architectural assumption: the AI services will run on data that has already landed in the platform. They have to be in the platform. Anything outside the namespace is, from the platform’s point of view, invisible.

“So, the platform ships with tools to bring data in. It ships with a story about how centralizing data simplifies operations. It ships with a unified UI that, in the demo, makes the whole estate look like one clean thing.”

“Architectures that treat all three [data, metadata and vectors] as the same substance default to one bad answer: move everything. Architectures that treat them as distinct can leave the heavy data governed where it lives, propagate metadata across the estate and let vectors do the cross-environment reasoning AI actually needs.

“A unified namespace is a coherent answer if the only form you recognize is data. It is the wrong answer the moment you accept that metadata and vectors exist as first-class citizens.”

In a federated architecture, like Dell’s;

The data stays where it lives, governed by the teams that already govern it.

The metadata propagates everywhere, so every AI you choose sees every asset regardless of location.

The vectors carry meaning across the estate, so AI can reason about data without first relocating it.

Hyde’s third [blog](https://www.dell.com/en-us/blog/why-expensive-gpus-sit-idle/) argues that Dell’s federated architecture performs better than the VAST AI OS storage-embedded stack at delivering data to GPUs.

Dell published head-to-head testing in October 2025 on a vectors-moving-to-GPUs-via-KV-cache workload, using the Qwen3-32B model. The results, drawn from Dell’s internal testing and a public VAST disclosure:

PowerScale: 0.82-second Time to First Token (TTFT)

ObjectScale: 0.86-second TTFT

VAST: 1.5-second TTFT

Standard vLLM without KV cache offloading: 11.8-second TTFT

He says “an architecture built to feed GPUs, treating the heavy data as gravity-bound and the meaning-bearing vectors as portable — compounds the [Dell] advantage with every inference request. An architecture where feeding the GPU depends on data first arriving in a vendor namespace is structurally a step behind.³”

Hyde notes that “VAST has continued to publish on KV cache offload — most recently a December 2025 result with Nvidia Dynamo and CoreWeave reporting a roughly 20x TTFT improvement over recompute and a 90 perfcent gain in GPU efficiency — but on a different workload and baseline, and without a Qwen3-32B head-to-head that improves on the 1.5 second TTFT above.”

He says that enterprises considering data architectures for getting data to GPU servers should ask this question: “What will my GPU utilization look like on this platform under a realistic inference workload, with KV cache offload, against a data estate that includes regulated, sovereign and application-coupled sources?

“That question forces a vendor to confront the structural argument. It forces them to answer for what happens when data can’t land in their namespace, not just what happens when it has. It forces them to publish reproducible TTFT, tokens-per-second and cache hit rates on a current open-source model — methodology and all. And it forces an honest answer to the only economic question that matters: how much GPU idle time is your architecture budgeting for?”

##### Comment

As we understand it, vectorized datasets are significantly larger than the raw data on which they are based. The vector embeddings are [typically](https://discuss.elastic.co/t/vector-embedding-huge-size-increase/356670) several times larger than the original raw data they represent—often 3x to 20x or more per item, depending on chunking, dimensionality, and storage format. The "original dataset" is usually chunked before embedding (e.g., documents split into paragraphs or fixed token windows) for better retrieval quality. A typical text chunk (250–500 words) is roughly 0.5–2 KB of raw text. The vector alone is often 3–10x larger than its source chunk.

Where is the data vectorized and, if it is not adjacent to the GPUs, are the vector data sets moved? They have gravity, because of their sheer size. Either you move the data to the vectorizing place or you move the vector data set to the AI processing location, fighting data gravity in both cases. We may be misunderstanding things here, and have asked Hyde to comment on this issue.
