04:00
2026-06-29
arxiv.org
large-language-models
ATOD: Annealed Turn-aware On-policy Distillation for Multi-turn Autonomous Agents
Researchers propose ATOD, a hybrid online distillation algorithm that combines on-policy distillation and reinforcement learning to train small language-model agents for multi-turn tasks. ATOD uses anβ¦