04:00
2026-06-25
arxiv.org
large-language-models
Improved Large Language Diffusion Models
Researchers introduced iLLaDA, an 8B masked diffusion language model trained from scratch with fully bidirectional attention, scaling pre-training to 12T tokens and fine-tuning on a 25B-token instructβ¦