cd /news/machine-learning/performance-analysis-and-optimizatio… · home topics machine-learning article
[ARTICLE · art-33553] src=arxiv.org ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures

Researchers at arXiv analyzed the performance of Med-DDPM, a 3D generative diffusion model for MRI synthesis, across three generations of NVIDIA GPUs. They identified inefficiencies in memory access and Tensor Core utilization, and applied TF32 Tensor Core activation and a 3D channels-last layout to achieve up to 100x reduction in SM cycles and dynamic instructions, with no loss in synthesis quality.

read1 min views1 publishedJun 19, 2026

arXiv:2606.19365v1 Announce Type: new Abstract: Diffusion models have become essential for high-fidelity 3D MRI synthesis, yet their deployment remains constrained by substantial GPU resource demands arising from hundreds of U-Net evaluations per sample and a highly heterogeneous kernel behavior. This paper performs a comprehensive performance analysis of the state-of-the-art medical diffusion model, Med-DDPM, across three generations of NVIDIA architectures to study kernel-level runtime breakdowns, instruction-mix characteristics, memory system utilization, warp-level activities, and profiler priority-score estimates. We show that training is overwhelmingly dominated by cuDNN convolution and implicit-GEMM kernels, with inefficiencies arising from memory-access patterns, tensor-layout conversions, and limited Tensor Core utilization. Guided by these insights, we evaluate two architecture-aware optimizations TF32 Tensor Core activation and a 3D channels-last layout and demonstrate that they reduce SM cycles by up to 100x, cut dynamic instructions by 100x, raise Tensor Core utilization from 1.45 to 9.98x, and increase IPC by 7% on A100, all without degrading synthesis quality.

── more in #machine-learning 4 stories · sorted by recency
── more on @nvidia 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/performance-analysis…] indexed:0 read:1min 2026-06-19 ·