{"slug": "diffusionbench-towards-holistic-evaluation-of-generative-diffusion-transformers", "title": "DiffusionBench: Towards Holistic Evaluation of Generative Diffusion Transformers", "summary": "Researchers released DiffusionBench, a unified codebase for holistic evaluation of generative diffusion transformers across tasks like ImageNet and text-to-image generation. The benchmark supports training and evaluation of tokenizers and diffusion models with a single interface, aiming to provide more comprehensive assessment beyond traditional ImageNet metrics.", "body_md": "\n\n```\n##############################################################################\n#                                                                            #\n#   ____  _  __  __           _                            .-----------.     #\n#  |  _ \\(_)/ _|/ _|_   _ ___(_) ___  _ __                 |           |     #\n#  | | | | | |_| |_| | | / __| |/ _ \\| '_ \\                | ░▒▓█▓▒░▒▓ |     #\n#  | |_| | |  _|  _| |_| \\__ \\ | (_) | | | |               | ▒▓█████▓▒ |     #\n#  |____/|_|_| |_|  \\__,_|___/_|\\___/|_| |_|               | ▓███████▓ |     #\n#                                                          |     ↓     |     #\n#   ____                  _                                | █████████ |     #\n#  | __ )  ___ _ __   ___| |__                             | ▓███████▓ |     #\n#  |  _ \\ / _ \\ '_ \\ / __| '_ \\                            | ▒▓█████▓▒ |     #\n#  | |_) |  __/ | | | (__| | | |                           |           |     #\n#  |____/ \\___|_| |_|\\___|_| |_|                           '-----------'     #\n#                                                                            #\n#           Because ImageNet evaluation alone is no longer enough!           #\n#                                                                            #\n##############################################################################\n```\n\n📣 Announcement post:[Call for DiffusionBench: A Holistic Benchmark for Diffusion Transformers]. Help us grow the benchmark with new evaluation axes, new metrics, and faithful reproductions of published methods.\n\nThis repo contains the unified codebase for DiffusionBench. It supports training and evaluation across different generation tasks (ImageNet, T2I, ...) through a single interface. Please see the sections below for the detailed structure. Come join us!\n\n[\n](/End2End-Diffusion/diffusion-bench/blob/main/assets/qualitative.webp)\n\nText-to-image samples at 256×256 from models trained for 200K iterations using DiffusionBench.\n\n```\n# install uv project manager (if you don't already have it)\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# install dependencies\nuv sync\n\n# prepare data\nuv run python scripts/prepare.py --data {all,imagenet,t2i,eval}\n\n# download pretrained models\nuv run hf download diffusion-bench/diffusion-bench --local-dir pretrained_models --exclude .gitattributes\n```\n\nReproduction flow: **Stage 1 → Stage 2**. Set these environment variables first (used for the output directory and W&B logging):\n\n```\nexport EXPERIMENT_NAME=<run-name>\nexport ENTITY=<wandb-entity>\nexport PROJECT=<wandb-project>\nexport WANDB_KEY=<key>\n```\n\n**Stage 1.** Train the RAE tokenizer:\n\n```\nuv run torchrun --standalone --nproc_per_node=8 \\\n    src/train_stage1.py \\\n    --config [STAGE1_CONFIG_PATH] \\\n    --results-dir results/stage1 --precision bf16 --compile --wandb\n```\n\n**Stage 2.** Train the diffusion model on VAE/RAE/Pixel space:\n\n```\nuv run torchrun --standalone --nproc_per_node=8 \\\n    src/train.py \\\n    --config [STAGE2_CONFIG_PATH] \\\n    --results-dir results/stage2 --precision bf16 --compile --wandb\n```\n\nStage 2 *training* configs run online evaluation during training (the `eval:`\n\nblock). For standalone evaluation of a released checkpoint, use the ** sampling/** configs — each embeds\n\n`stage_2.ckpt`\n\n(pointing into `pretrained_models/`\n\n) and the eval-time guidance, so the weights load automatically:\n\n```\nexport EXPERIMENT_NAME=<run-name>\n\n# stage 1 reconstruction (rFID/PSNR/SSIM/LPIPS)\nuv run torchrun --nproc_per_node=8 src/offline_eval_stage1.py --config [STAGE1_CONFIG_PATH]\n\n# stage 2 generation (FID/IS, GenEval/DPGBench/...)\nuv run torchrun --nproc_per_node=8 src/offline_eval.py --config [STAGE2_CONFIG_PATH]\nconfigs/\n├── stage1/\n└── stage2/\n    ├── training/\n    │   ├── imagenet/\n    │   └── t2i/\n    └── sampling/\n        ├── imagenet/\n        └── t2i/\n```\n\nStage 2 spans VAE (11), RAE (6), REG (4), and Pixel (3) families, identical across ImageNet and T2I. Swap any config between tasks with a single path change. The `sampling/`\n\nset mirrors `training/`\n\nbut adds the trained checkpoint and eval-time guidance, so it runs offline eval directly.\n\nFor ImageNet, pick the CFG-off baseline (`[STAGE2_CONFIG_PATH].yaml`\n\n) or the per-model best-CFG variant (`[STAGE2_CONFIG_PATH]-cfg<scale>-t0.0-0.9.yaml`\n\n).\n\n| Category | Methods |\n|---|---|\nLatent Space |\n`Pixel Space` `RAE` (30+ representation encoders): `DINOv2` `SigLIP2` `WebSSL` `PE` `LangPE` and more `RAEv2` (30+ representation encoders): `DINOv2` `SigLIP2` `WebSSL` `PE` `LangPE` etc `VAE` (10+ VAEs): `FLUX.2` `FLUX.1` `SD3.5` `VA-VAE` `E2E-VAE` and more |\nOutput Prediction |\n`x-prediction` `v-prediction` |\nTransport |\n`Rectified-Flow` `MeanFlow` `Improved-MeanFlow` `Pixel-MeanFlow` `Drifting` |\nLoss |\n`Flow Matching` `REPA` `iREPA` |\nArchitecture |\n`LightningDiT` `JiT` `DDT` |\nTasks |\n`ImageNet` : class-conditional generation `T2I` : text-to-image generation |\nEvaluation |\nImageNet: `FID` `IS` T2I: `GenEval` `DPGBench` `GenAIBench` `VQAScore` |\nTraining Backend |\n`DDP` `FSDP [TODO]` |\n\n| Status | Details | |\n|---|---|---|\nCoding Agents |\nYes | Agent-compatible. See\n`skills/` |\n\n**AutoResearch** We welcome contributions! Please refer to [ docs/contributors.md](/End2End-Diffusion/diffusion-bench/blob/main/docs/contributors.md) and\n\n[for further details.](/End2End-Diffusion/diffusion-bench/blob/main/docs/contributing.md)\n\n`docs/contributing.md`\n\nThe codebase is built upon some amazing projects:\n\nWe thank the authors for making their work publicly available.", "url": "https://wpnews.pro/news/diffusionbench-towards-holistic-evaluation-of-generative-diffusion-transformers", "canonical_source": "https://github.com/End2End-Diffusion/diffusion-bench", "published_at": "2026-06-24 02:12:31+00:00", "updated_at": "2026-06-24 02:44:11.880782+00:00", "lang": "en", "topics": ["generative-ai", "machine-learning", "computer-vision", "ai-research", "ai-tools"], "entities": ["DiffusionBench", "ImageNet", "W&B", "RAE", "VAE", "Pixel"], "alternates": {"html": "https://wpnews.pro/news/diffusionbench-towards-holistic-evaluation-of-generative-diffusion-transformers", "markdown": "https://wpnews.pro/news/diffusionbench-towards-holistic-evaluation-of-generative-diffusion-transformers.md", "text": "https://wpnews.pro/news/diffusionbench-towards-holistic-evaluation-of-generative-diffusion-transformers.txt", "jsonld": "https://wpnews.pro/news/diffusionbench-towards-holistic-evaluation-of-generative-diffusion-transformers.jsonld"}}