{"slug": "diffusionblocks-block-wise-nn-training-via-diffusion-interpretation", "title": "DiffusionBlocks – Block-Wise NN Training via Diffusion Interpretation", "summary": "Researchers have developed DiffusionBlocks, a framework that partitions transformer neural networks into independently trainable blocks to reduce memory requirements proportionally while maintaining competitive performance. The method, detailed in a paper accepted at the 2026 International Conference on Learning Representations, enables block-wise training by interpreting the process through a diffusion model lens. The official implementation supports image classification with Vision Transformers and is available for public use.", "body_md": "We propose\n\n, a principled framework that partitions transformers into independently trainable blocks, reducing memory requirements proportionally while maintaining competitive performance across diverse architectures and tasks.DiffusionBlocks\n\nThis is an official implementation of * DiffusionBlocks* on image classification using Vision Transformers (ViT).\n\nPlease install [uv](https://docs.astral.sh/uv/getting-started/installation/). Then, run:\n\n```\n# Install dependencies\nuv sync\n\n# make sure to login huggingface and wandb\nuv run huggingface-cli login\nuv run wandb login\n```\n\nWe conducted our experiments in the following environment: Python Version 3.12 and CUDA Version 12.2 H100.\n\nThe model checkpoints are saved in `logs`\n\nfolder.\n\n**Baseline (ViT):**\n\n```\nuv run main.py train cifar100 --model_type vit\n```\n\n**DiffusionBlocks:**\n\n```\nuv run main.py train cifar100 --model_type dblock\n```\n\n**NOTE:** the total epochs in DiffusionBlocks is multiplied by the number of blocks to align the total number of iterations with the baseline as one step in DiffusionBlocks corresponds to training for one block.\n\n## Details\n\nIn the base setting, we don't reply on techniques such as heavy data augmentation. In case you want to see the performance with heavy data augmentation and learning rate scheduler, run as follows:\n\n**Baseline (ViT):**\n\n```\nBATCH_SIZE=128\nEPOCHS=1000\nPOSTFIX=\"-rand-augment\"\nWARMUP_STEPS=3900\nMODEL_TYPE=\"dblock\"\nsrun uv run main.py train cifar100 \\\n    --model_type $MODEL_TYPE \\\n    --batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \\\n    --scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \\\n    --scheduler_specific_kwargs '{\"min_lr\": 5e-5}' \\\n    --add_rand_aug\n```\n\n**DiffusionBlocks:**\n\n```\nBATCH_SIZE=128\nEPOCHS=1000\nPOSTFIX=\"-rand-augment\"\nWARMUP_STEPS=$((3900 * 3)) # 3 indicates the number of blocks\nMODEL_TYPE=\"dblock\"\nsrun uv run main.py train cifar100 \\\n    --model_type $MODEL_TYPE \\\n    --batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \\\n    --scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \\\n    --scheduler_specific_kwargs '{\"min_lr\": 5e-5}' \\\n    --add_rand_aug\n```\n\n**Baseline (ViT):**\n\n```\nCKPT_PATH=\"logs/path-to-last.ckpt\"\nuv run main.py test cifar100 --model_type vit --ckpt_path $CKPT\n```\n\n**DiffusionBlocks:**\n\n```\nCKPT_PATH=\"logs/path-to-last.ckpt\"\nuv run main.py test cifar100 --model_type dblock --ckpt_path $CKPT\n```\n\nThe implementation of Vision Transformer in [vit.py](/SakanaAI/DiffusionBlocks/blob/main/vit.py) is based on [HuggingFace Transformers](https://github.com/huggingface/transformers). And, the implementation of EDM is based on [Stability-AI/generative-models](https://github.com/Stability-AI/generative-models).\n\nWe are grateful for their work.\n\nTo cite our work, please use the following BibTeX:\n\n```\n@inproceedings{shing2026diffusionblocks,\n  title     = {DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation},\n  author.   = {Makoto Shing and Masanori Koyama and Takuya Akiba},\n  booktitle = {The Fourteenth International Conference on Learning Representations},\n  year      = {2026},\n  url       = {https://openreview.net/forum?id=pwVSmK71cS}\n}\n```\n\n", "url": "https://wpnews.pro/news/diffusionblocks-block-wise-nn-training-via-diffusion-interpretation", "canonical_source": "https://github.com/SakanaAI/DiffusionBlocks", "published_at": "2026-05-29 21:46:29+00:00", "updated_at": "2026-05-29 22:15:59.456667+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "computer-vision", "artificial-intelligence", "ai-research"], "entities": ["DiffusionBlocks", "Vision Transformers", "ViT", "Huggingface", "WandB", "Python", "CUDA", "H100"], "alternates": {"html": "https://wpnews.pro/news/diffusionblocks-block-wise-nn-training-via-diffusion-interpretation", "markdown": "https://wpnews.pro/news/diffusionblocks-block-wise-nn-training-via-diffusion-interpretation.md", "text": "https://wpnews.pro/news/diffusionblocks-block-wise-nn-training-via-diffusion-interpretation.txt", "jsonld": "https://wpnews.pro/news/diffusionblocks-block-wise-nn-training-via-diffusion-interpretation.jsonld"}}