05:00
2026-07-01
dev.to
machine-learning
RL-driven data mixing boosts evaluation scores
A reinforcement learning-driven data scheduler, AC-ODM, boosts MMLU performance by 27.5% relative and HumanEval pass@1 by 2.23ร on a Pythia-1B model with only a 0.4% per-step wall-clock increase and 2โฆ