DiffusionBlocks – Block-Wise NN Training via Diffusion Interpretation

wpnews.pro

cd /news/machine-learning/diffusionblocks-block-wise-nn-traini… · home › topics › machine-learning › article

[ARTICLE · art-18147] src=github.com ↗ pub=2026-05-29T21:46Z topic=machine-learning verified=true sentiment=↑ positive

DiffusionBlocks – Block-Wise NN Training via Diffusion Interpretation

Researchers have developed DiffusionBlocks, a framework that partitions transformer neural networks into independently trainable blocks to reduce memory requirements proportionally while maintaining competitive performance. The method, detailed in a paper accepted at the 2026 International Conference on Learning Representations, enables block-wise training by interpreting the process through a diffusion model lens. The official implementation supports image classification with Vision Transformers and is available for public use.

read2 min views24 publishedMay 29, 2026

We propose

, a principled framework that partitions transformers into independently trainable blocks, reducing memory requirements proportionally while maintaining competitive performance across diverse architectures and tasks.DiffusionBlocks

This is an official implementation of * DiffusionBlocks* on image classification using Vision Transformers (ViT).

Please install uv. Then, run:

uv sync

uv run huggingface-cli login
uv run wandb login

We conducted our experiments in the following environment: Python Version 3.12 and CUDA Version 12.2 H100.

The model checkpoints are saved in logs

folder.

Baseline (ViT):

uv run main.py train cifar100 --model_type vit

DiffusionBlocks:

uv run main.py train cifar100 --model_type dblock

NOTE: the total epochs in DiffusionBlocks is multiplied by the number of blocks to align the total number of iterations with the baseline as one step in DiffusionBlocks corresponds to training for one block.

Details #

In the base setting, we don't reply on techniques such as heavy data augmentation. In case you want to see the performance with heavy data augmentation and learning rate scheduler, run as follows:

Baseline (ViT):

BATCH_SIZE=128
EPOCHS=1000
POSTFIX="-rand-augment"
WARMUP_STEPS=3900
MODEL_TYPE="dblock"
srun uv run main.py train cifar100 \
    --model_type $MODEL_TYPE \
    --batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \
    --scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \
    --scheduler_specific_kwargs '{"min_lr": 5e-5}' \
    --add_rand_aug

DiffusionBlocks:

BATCH_SIZE=128
EPOCHS=1000
POSTFIX="-rand-augment"
WARMUP_STEPS=$((3900 * 3)) # 3 indicates the number of blocks
MODEL_TYPE="dblock"
srun uv run main.py train cifar100 \
    --model_type $MODEL_TYPE \
    --batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \
    --scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \
    --scheduler_specific_kwargs '{"min_lr": 5e-5}' \
    --add_rand_aug

Baseline (ViT):

CKPT_PATH="logs/path-to-last.ckpt"
uv run main.py test cifar100 --model_type vit --ckpt_path $CKPT

DiffusionBlocks:

CKPT_PATH="logs/path-to-last.ckpt"
uv run main.py test cifar100 --model_type dblock --ckpt_path $CKPT

The implementation of Vision Transformer in vit.py is based on HuggingFace Transformers. And, the implementation of EDM is based on Stability-AI/generative-models.

We are grateful for their work.

To cite our work, please use the following BibTeX:

@inproceedings{shing2026diffusionblocks,
  title     = {DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation},
  author.   = {Makoto Shing and Masanori Koyama and Takuya Akiba},
  booktitle = {The Fourteenth International Conference on Learning Representations},
  year      = {2026},
  url       = {https://openreview.net/forum?id=pwVSmK71cS}
}

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/diffusionblocks-block-wi…

Read original on github.com → github.com/SakanaAI/DiffusionBlocks

mentioned entities

DiffusionBlocks

Vision Transformers

ViT

Huggingface

WandB

Python

CUDA

H100

metadata

slugdiffusionblocks-block-wise-nn-training-via-diffusion-interpretation

topic#machine-learning

secondary4 topics

sentimentpositive

canonicalgithub.com

navigation

← prevAI Doesn't Scale Until You Stop …

next →Lone attacker published 14 malic…

── more in #machine-learning 4 stories · sorted by recency

arxiv.org · 13 Jul · #machine-learning

Vision Transformers Learn Gestalt-Like Figure-Ground Cues from Natural Images

machinebrief.com · 1 Jul · #machine-learning

FlexViT: Bringing Vision Transformers to Edge Devices with Speed

dev.to · 13 Jul · #machine-learning

Top AI Papers on Hugging Face - 2026-07-13

machinebrief.com · 13 Jul · #machine-learning

6G: The AI Framework Set to Revolutionize Network Function Management

── more on @diffusionblocks 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required