04:00
2026-05-27
arxiv.org
artificial-intelligence
From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator
Researchers have identified a fundamental limitation in training LLM-based dialogue agents, showing that both static context reinforcement learning and prompt-based interactive RL suffer from context โฆ