Hi all.
I have one, maybe two, questions for you. This question came out of a webinar series on High Performance Computing (HPC) I took part in (the Italy–Germany HPC webinars organised on the Italian side through CNR). I raised this concern there, and my impression was that the other speakers did not share it to the same degree. The room leaned more optimistic than I am. That is exactly why I want to put it to a wider audience: I may be wrong, and I would like to hear where others land.
The concern is precision. Most scientific HPC needs double precision (FP64). In computational fluid dynamics, which is my field, we resolve physical scales spanning many orders of magnitude, and to do that correctly (with very high-order accuracy methods), we need 64-bit floating point.
AI computing does not need this. Training and inference work well at 8-bit (now even at 4-bit). So, the two workloads require different hardware: AI needs many low-precision cores, while science requires strong FP64 capabilities.
The problem is that the vendors follow the AI market because that is where the money is. Comparing on vector FP64 (peak, dense), the recent trend is to hold it flat or lower it, and spend the transistors on low-precision math instead:
NVIDIA H100: 34 TFLOP/s vector FP64, or 67 with the FP64 tensor-core path. The newer B200 does about 40 vector FP64. Blackwell dropped the dedicated FP64 tensor-core path that Hopper had, and gained around 20 PFLOP/s of FP4 for AI. The Rubin roadmap reportedly cuts FP64 further. AMD MI300X: 81.7 TFLOP/s FP64. The newer MI355X does 78.6, below its own predecessor, with the gains all in FP8/FP4 for AI inference. Intel has stepped back from a dedicated HPC GPU. Its current HPC silicon, the Max-series (Ponte Vecchio) in Aurora, has no standalone successor. Intel cancelled Falcon Shores as a product in early 2025 and folded its HPC and AI lines into one chip, Jaguar Shores, due around 2026/2027. Intel describes it as serving both AI and HPC, but says it will compete on total cost of ownership rather than peak FLOPS, and has published no FP64 figure. Consumer silicon makes the direction plainest. NVIDIA’s N1X, the new Blackwell laptop chip, publishes only AI-precision figures (NVFP4, around 1000 TOPS) and quotes no FP64 at all. Double precision is simply not a design goal there.
So across all three vendors the direction looks the same. The new chips are built for AI, and double precision gets quietly de-prioritized along the way.
There is one strong counter-current. AMD’s MI430X, coming this year, is a deliberate HPC part. AMD claims more than 200 TFLOP/s of FP64, and independent estimates back out around 211 from the Alice Recoque exascale contract, which would be the highest of any GPU so far, while it still carries FP4/FP8 for AI. It will power Alice Recoque, the next European exascale machine, alongside the US Discovery and Germany’s Herder. So a dedicated FP64 line still exists, for now.
But it is one product line, from one vendor, against a whole market moving the other way. That is what I cannot resolve: whether a first-class FP64 hardware line survives, or shrinks to a small premium niche while everything else is optimized for AI.
Two questions for you:
- Do you share this concern, or do you think I am overstating it?
- If you share it, do you already see a way out?
I would be glad to hear how others in the Fortran and HPC community are thinking about this.
Stefano