Sometimes, Air Is The Only Way For AI Systems To Keep Their Cool

AMD has released the MI350P, a PCI-Express version of its Instinct MI350X GPU that is intentionally built with half the components to enable air cooling for on-premise AI inference and small model training. The new card, available now, targets financial services firms and other enterprises that cannot support liquid-cooled, rackscale systems in their existing datacenters. By offering a half-package design rather than bin-sorted defective chips, AMD aims to provide the CDNA 4 architecture's benefits—including support for FP8, MXFP6, and MXFP4 data formats—in standard server form factors without requiring specialized cooling infrastructure.

compute Sometimes, Air Is The Only Way For AI Systems To Keep Their Cool Not everybody has a datacenter that supports liquid cooling, and not every company, particularly those with datacenters inside of major metropolitan areas who legally or practically need their systems all in one place, will ever separate their AI systems from the production systems upon which they want to build GenAI models and infer against. Moreover, many companies are doing classical machine learning as well as GenAI, and in many cases they do not need rackscale compute nodes to do their inference. And if they need rackscale nodes for training, they can – and often do – rent them from a big cloud builder or a neocloud. This need to stay on premise for production inference work and small model training is particularly keen for hedge funds, algorithmic trading companies, and other kinds of financial services firms who are using machine learning and relatively small GenAI models to help them make money by analyzing what is going on in markets and making split-second decisions that humans do not have the processing speed and reflexes to do. What holds true for FSI firms is equally true of those in manufacturing, distribution, life sciences, and other industries. They do not have datacenters where you can drop down a single rack that burns 145 kilowatts and keep it cool. They have to spread their AI systems out much as they have to do for their general purpose infrastructure, despite the inefficiencies this entails. All of this boils down to there still being a need for air-cooled GPU systems, something I talked about in detail last summer here https://www.nextplatform.com/compute/2025/07/24/financial-services-firms-will-bank-on-homegrown-ai-training/100819 and there https://www.nextplatform.com/ai/2025/07/31/for-financial-services-firms-ai-inference-is-as-challenging-as-training/100825 . And to that end, AMD has searched through the bins of its Instinct MI350 series GPUs and cooked up a half-capacity version of the MI350X that comes in a retro PCI-Express form factor that can plug into standard server form factors. The new card is called the MI350P, as you might expect, and it is available now. We assumed when we first heard about a future PCI-Express variant of the MI350 series that this would be a recycling from the bins of finished MI350X and MI355X parts that are clocked at a uniform 2.2 GHz and had only half their HBM3E memory stacks working. If AMD was doing a bin sort, you would expect a distribution of compute cores and HBM memory capacity being sold, not just one configuration. But the capacity of the MI350P is precisely half of what the MI350X offers. And that is because the MI350P is a chip package that has half the components in a smaller socket, and this is done absolutely intentionally so that the device can be air cooled and still offer all of the benefits of the CDNA 4 architecture that debuted back in June 2025 with the MI350 Series launch https://www.nextplatform.com/compute/2025/06/12/amd-plots-interception-course-with-nvidia-gpu-and-system-roadmaps/1648000 . So it’s a half package, not a half dud. It looks like this: The important MI350 Series features that are common across the lineup include using twelve high stacks of HBM3E memory as well as the CDNA 4 compute complexes that support OCP-FP8, MXFP6, and MXFP4 data formats for boosting their effective throughput of GPUs for both training and inference. Here are the specs for the MI350P: What is interesting about these specifications and in contrast to what we see for the MI300 Series and MI350 Series devices that use the OAM form factor and that also offer memory coherency across interconnected GPUs and CPUs is that AMD is showing delivered actual flops as well as peak theoretical flops at each precision. We don’t know what benchmark test AMD is using, but the company is being honest about what to expect from these trimmed down MI350P cards. As you can see, on whatever these tests were, the MI350P was able to deliver 90 percent of the 4 TB/sec of peak bandwidth. As for compute, on the 16-bit and 8-bit math, somewhere between 58 percent and 66 percent of peak performance is being delivered on this test, and MXFP6 is delivering 58 percent as well, but MXFP4 is only delivering 50 percent of peak. What is also interesting in the presentation for the MI350P is that AMD is being perfectly honest about how it is positioned against not only the air-cooled system boards based on the MI350X and MI355X GPUs that are the flagship GPUs from the company, but how you position the MI350P against running classical machine learning and GenAI inference on Epyc GPUs and Radeon AI Pro cards, including suggested AI model size limits based on the memory and compute: The sweet spot for the MI350P is for models that have around 200 billion to 250 billion parameters, which is a perfectly reasonable size model commonly used in the enterprise to augment all kinds of data processing and transaction processing. I like to take a broad historical view of compute engines, and to that end below is a monster table that shows the entire Instinct GPU line since the MI25 was launched way back in the summer of 2017 https://www.nextplatform.com/ai/2017/08/08/the-shape-of-amd-hpc-and-ai-iron-to-come/1652551 . On interesting thing about the MI350P is that it can be geared down for environments or server enclosures that cannot take a lot of heat. The peak performance specs shown for the MI350P in the tables above assume that you are running the GPU at 2.2 GHz and that the system can dissipate a maximum of 600 watts of heat. But there is a way to throttle the MI350P back to 450 watts, which is a 25 percent reduction in power, and I think that is probably only a reduction of 10 percent to 15 percent in performance, which means it is probably only dropping down to 1.9 GHz to 2 GHz on the clock speed. On workloads that are memory bandwidth sensitive, the reduction in actual, real-world performance could be less than 10 percent since the memory speed is not being geared down as far as I can tell and the capacity is not being capped, either. Given the need for better price/performance per watt, it is reasonable to expect many of the customers that the MI350P is aimed at to go for the 450 watt downgrade – and it is also reasonable to expect them to want to pay around 10 percent less than their negotiated price off of whatever list price is for these devices. The usual suspects of OEMs and ODMs are lined up to make systems based on the MI350P, presumably with one or two “Genoa” Epyc 9004 or a “Turin” Epyc 9005 CPU acting as host processor for four or eight of these MI350Ps in a single node. Dell is putting together PowerEdge XE7745 and PowerEdge R7725 rack servers with the MI350Ps, and Hewlett Packard Enterprise is adding them to the ProLiant DL385 and 385a Gen 11 servers as well as the ProLiant DL345 Gen 12 servers. Lenovo is adding them to the ThinkSystem SR675/I v3 machines, and Cisco Systems is putting them into its C845a M8, X Series 580p, and UC245 M8 servers. Supermicro is rounding out the OEMs with its AS -5126GS-TNRT, AS -5126GS-TNRT2, AS -2026HS-TN, and AS -2116CS-TN machines. I strongly suspect that these will be as scarce as hound’s teeth, just like all GPUs are today. Pricing was not announced for the MI350P, but it should be a little less than half of whatever the price is for the MI350X, which supports memory coherency across the GPUs and into the CPUs while the MI350P cannot even have two-way coherency across GPUs. The MI350P is absolutely and only standalone, no matter how many you put into the machine.