AMD Stretches Server DRAM With Flash Extended Memory

wpnews.pro

There is a crisis building in the datacenter, and it is centered around the scarcity of and ridiculously high prices for DRAM main memory. The GenAI boom has caused the hyperscalers, cloud builders, AI model builders, and neoclouds to hog what has become the limited capacity coming out of the memory foundries of Micron Technology, Samsung, and SK Hynix, and the demand shock is causing unprecedented price hikes.

How bad is it? According to Counterpoint Research, which tracks component pricing, 64 GB DIMM memory prices rose by a factor of 3.5X between Q3 2025 and Q1 2026, and it sure looks like they will be up by a factor of 5X by Q3 2026. And according to the most recent financials from Micron, which I reported on here, there is no end in sight to the demand shock out to 2028, which means prices will keep getting higher.

As it is, DRAM main memory has gone from representing around 50 percent of the cost of a server, with CPUs being about a quarter of the bill of materials in 2023, with other peripherals and flash and disk storage comprising the other half, to DRAM being somewhere between 60 percent and 90 percent of the cost of a server here in the middle of 2026, averaging around 75 percent. Those CPUs didn’t get cheaper, and even still, because memory prices are skyrocketing the increasing CPU prices look small by comparison.

Something has to give, and the one thing that can be done for sure is that main memory can be used more efficiently and it can even be extended with flash, which has been a kind of Holy Grail for people in the flash business for more than a decade and which was the point of the 3D XPoint storage that Intel and Micron launched with much fanfare back in July 2015.

The promise of 3D XPoint was to have something that had performance somewhere between DRAM and flash, that was byte addressable like main memory, but had a cost more like flash. Because Intel didn’t ramp 3D XPoint fast enough, both in terms of manufacturing and in technology, it ended up being almost as costly as DRAM and only a few multiples faster than flash. Had Intel not decided to try to keep 3D XPoint as something just for its own Xeon server processors, it may have gone mainstream and helped solve the current memory crisis. But alas, Intel lost its mind a little bit and killed off 3D XPoint and sold off its flash business to SK Hynix in a classic “burn the furniture to keep the house warm” maneuver. Imagine the profits Intel would have now if it was still in the flash business. . . .

And now, the three big memory makers are allocating more and more of their DRAM production to pricey HBM stacked memory, which is cutting back on normal DRAM chips used in server DIMM modules, and they are also cutting back on flash output to try to boost DRAM. DRAM and flash production will increase by maybe 20 percent to 25 percent per year, and demand is crazy higher than that. And so DRAM and flash prices go up and up as the Big Three sell capacity to the highest bidders and are getting filthy rich.

This is why AMD just shelled out an undisclosed amount of money to acquire a shiny new startup called MEXT, whose name is meant to invoke the idea of memory extension. The MEXT team has come up with a way of transparently and invisibly extending DRAM main memory to flash storage.

People have been trying to do this since back in the Fusion-io days, and Gary Smerdon, co-founder and chief executive officer at the formerly independent MEXT, which was founded in 2023 and which only dropped out of stealth mode back in early April of this year, knows this full well because he was chief strategy and product officer at Fusion-io back then. (Fusion-io was the first big commericalizer of flash storage for servers, and had both Apple and Meta Platforms as its anchor customers more than a decade ago.)

For six years before the Fusion-io gig, Smerdon led the solid state memory efforts at LSI Logic. More recently, Smerdon was co-founder and chief executive officer at TidalScale, which created a HyperKernel hypervisor that allowed companies to create large-scale virtual NUMA servers out of smaller physical NUMA servers. TidalScale raised $70.3 million in multiple rounds and was acquired by Hewlett Packard Enterprise in December 2022. (We have no idea what HPE has done with HyperKernel since then.) There is chatter on the Internet that MEXT had raised $2.4 million in seed funding, but we suspect it was more than that. Clear, DN Capital, Uncorrelated, Raptor, and FJ Labs were all seed money suppliers.

Many of the 39 employees of MEXT hail from TidalScale, as you might expect, but Smerdon brought in some outside expertise on memory management and virtualization in the upper ranks. David Reed is a co-founder and was chief scientist at TidalScale as well as at Lotus Development – remember them? – way back in the day. More recently Reed has been a Fellow at HPE and a vice president at SAP, and he is a long-time professor of computer science at MIT who was instrumental in helping create parts of the Internet stack. Carl Waldspurger, who was the principal engineer at VMware in charge of processor scheduling, memory management, and NUMA scheduling for the ESX hypervisor as well as the architect for VMware's Distributed Resource Scheduler, was tapped to be MEXT’s chief scientist. Importantly, DRS is what controls the live migration of virtual machines, which is all about moving memory state between systems.

All three of these MEXT top brass were there, along with many of us in the late 1980s, when the Intel 80286 CPUs used in our PCs and just starting to be used in LAN servers were butting up against the 640 KB main memory barrier in the X86 architecture of the time. We were all fussing around with HIMEM.SYS extended memory drivers in DOS (and therefore underneath Windows) to make use of memory up to 1 MB in our machines. The 80386 had extended memory built in and busted us out to 4 MB of main memory along with 32-bit processing, and the world changed.

So extending memory is not a new idea, of course, but it is timely given that DRAM memory is so damned expensive right now. And even if flash is getting more expensive almost as fast, it is still 50X less expensive than DRAM, with 30X lower power consumption as well. However, flash is also 500X slower than DRAM. So if you want to use flash as a memory extender, you have to get clever about it.

“We came up with three problems that, if we could solve them, would change everything,” Smerdon tells The Next Platform. “One, we have got to increase DRAM utilization. That is so obvious. It's what everybody was trying to do CXL, increase DRAM utilization through pooling. There are, however, a lot of ways you can do that. The second problem is that we need no hardware or software changes for memory extension to work. I can go through my career with Ethernet on the motherboard, or working at AMD and LSI, we had fast growing products, and all of them had no software changes. In the ideal world, if you are a software company, you don't want hardware changes, either. And nobody that is focused on the memory problem has had this as a part of their core principles. And the third problem was to bring flash into the memory tier? It was 50X cheaper per bit when we started in 2023, maybe it is 100X times now, with 30X times lower power per bit. This is a great. There's just one little problem: Flash is 500X times slower, and that isn't going to perform well. And we all know that swap sucks, and so we had to crack these problems.”

The simple answer is to stop putting cold data on DRAM and cram it full of hot data, or data that will need to be hot in a few tens of nanoseconds from now. Pushing pages out from hot to warm to cold onto flash is relatively easy. But the real issue is that data can go from cold to red hot with on instruction running on a CPU, and that happens in a fraction of a nanosecond.

To do this, MEXT created what it calls Predictive Memory, of course using AI algorithms to watch applications and memory access patterns, to get data from that flash back into DRAM before the applications or the operating system asks for them.

“We have developed sophisticated machine learning models that have much better prediction accuracy and coverage than what has been done in the past,” explains Waldspurger. “We were inspired by modern AI techniques based on neural networks like LSTMs and LLM transformers, which are actually really excellent at sequence prediction. Instead of predicting tokens in a natural language conversation, we are applying similar ideas to predict sequences of future memory page accesses. And since our AI models run asynchronously, they can also benefit from richer information and context about longer term trends and leverage hardware counters, software events, and application features that aren't considered by traditional approaches. Our AI engine consists of a family of models that work together, and so we have an ensemble that combines both lightweight heuristic predictors and more powerful neural network models. And we are also actively and exploring and having good luck with other AI techniques.”

At the moment, the Predictive Memory Engine only works with Linux systems using X86 or Arm CPUs, but there is no technical reason why it could not be ported to RISC/Unix or proprietary systems should there be a need for it. The memory extension works on bare metal machines as well as those that have workloads orchestrated by virtual machine hypervisors or Kubernetes container controllers. The recommended configuration for the MEXT memory expansion is to figure out how much memory you need, and then buy half as DRAM and half as flash cards in the server. Smerdon says that MEXT has tested systems with 25 percent DRAM and 75 percent flash and this works as well. We presume there might be a performance hit at that 1:3 ratio compared to a 1:1 ratio, and we also presume it is entirely workload dependent.

As far as workloads go, in-memory databases that have memory optimizations already baked into them are a perfect fit for the MEXT extended memory, according to Smerdon, but traditional relational databases are not as ideal. Electronic design automation, data analytics, and digital content creation workloads “are screamingly good fits” for this extended memory, according to Smerdon. And there are big banks and hedge funds that are already using it to do heaven knows what. Graph databases also do surprisingly well on the extended memory, which MEXT did not expect.

Before the AMD acquisition, MEXT was charging a flat fee for a subscription to its Predictive Memory Engine at $3.99 per GB per year. We do not know what AMD’s plan is. What we can show you is the comparisons that MEXT did for a Dell server as well as instances on the AWS cloud.

Here is the Dell comparison with and without the MEXT extended memory:

The interesting bit there is not just how much cheaper the server is with flash extended memory, but the fact that Dell is no longer shipping PowerEdge R6725 servers with 3 TB and 6 TB options for main memory. (This was as of February 1.) You can get to 3 TB with a 1:1 ration of flash and DRAM – 1.5 TB each – and you can get to 6 TB effective capacity with a 1:3 ratio with 1.5 TB of DRAM and 4.5 TB of flash.

Here is the AWS comparison:

We have no idea what instance type or size was used to make this comparison, but clearly there are comparing ones with local flash.

MEXT has done some performance tests as well as on selected workloads, on a machine with an AMD “Zen 5” 128-core Epyc 9755 with a dozen 6.3 GHz memory sticks and a Kioxia NVM-Express flash drive. The machine ran Red Hat Enterprise Linux 8.10, which has a Linux 6.17.1 kernel. The machine had 64 GB of memory.

Running the Redis and Memcached key/value stores and the Neo4j graph database, the extended memory configuration with 32 GB of memory plus 32 GB of flash allocated to memory all did delivered 1.7X the bang for the buck of the machine with 64 GB of real DRAM allocated running the same workload. (It is roughly 95 percent of the throughput for a lot less money.)

Here is a drilldown on how Neo4j graph database performance and price/performance works out as you use 1:1 and 1:3 ratios of memory and flash with the MEXT memory extension:

These benchmarks were not done on the AMD Epyc server, but rather on the AWS cloud and also on relatively skinny 64 GB main memory capacities. (We are all going to have to get used to skinnier memory capacities for the remainder of this decade, it looks like.)

As you use more flash, you loose more performance, but the performance per dollar goes up a lot faster. It is a fair tradeoff.

Separately, MEXT did a Linux swap benchmark test using SideFX’s Houdini tool. This test was done on a desktop machine using a 64-core Ryzen Threadripper Pro 5995WX with 64 GB of memory and 1 TB of flash. This swapping happens when any application needs more memory than is physically in the system. On a 64 GB real memory configuration running an unspecified workload, it took around 2,000 seconds, and if you cut the memory back to 32 GB, the swap was bigger and took 3,400 seconds. With MEXT extended memory running on a 32 GB DRAM configuration plus the flash, the swap took 2,700 seconds. If you cut the machine back to 16 GB of real DRAM, the swap takes 22,000 seconds, but with MEXT extended memory with 16 GB of real DRAM plus flash, the swap time is reduced to 5,000 seconds. That is 1.3X better performance on a 32 GB DRAM configuration and a 4.4X better bang on a 16 GB DRAM configuration with extension out to the flash.

AMD has not disclosed what it paid to acquire MEXT, and it has not said how it will weave it into its systems – and importantly, whether it will pull a 3D XPoint and not allow it on machines using Intel processors or those running homegrown Arm CPUs. Hopefully, AMD has a more open mind about this than Intel did. We think that is highly likely, in fact.

source & further reading

nextplatform.com — original article HPE Delivers Upgraded HPC Hardware, Software For Security, Sovereignty, And Multi-Tenancy HPE Rides The Agentic AI Wave Back Into The Datacenter Everpure’s AI Strategy Is Almost Purely Based On Nvidia

AMD Stretches Server DRAM With Flash Extended Memory

Run your AI side-project on zahid.host