From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

wpnews.pro

cd /news/artificial-intelligence/from-static-context-to-calibrated-in… · home › topics › artificial-intelligence › article

[ARTICLE · art-14910] src=arxiv.org ↗ pub=2026-05-27T04:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator

Researchers have identified a fundamental limitation in training LLM-based dialogue agents, showing that both static context reinforcement learning and prompt-based interactive RL suffer from context distribution shift that degrades dialogue quality quadratically over turns. To address this, the team proposes Calibrated Interactive RL, a framework that aligns simulators with human interaction patterns to reduce the sim-to-real gap. Experiments across multiple dialogue tasks demonstrate that the calibrated approach achieves state-of-the-art performance by mitigating both policy-induced and simulator-induced distribution shifts.

read1 min views8 publishedMay 27, 2026

arXiv:2605.26403v1 Announce Type: new Abstract: A long-standing goal of the research community is to develop highly interactive LLM-based dialogue agents. Recent research focuses on optimizing policies based on fixed offline logs (Static Context RL) or using a prompt-based simulator (Interactive RL). In this work, we theoretically show that both paradigms are fundamentally limited by context distribution shift--a mismatch between dialogue histories observed during training and those encountered in real conversations. This shift compounds quadratically over turns and severely degrades dialogue quality. Specifically, we attribute this shift to two distinct sources: (i) policy-induced shift, arising from training on static histories rather than self-generated trajectories; and (ii) simulator-induced shift, stemming from discrepancies between simulated and real human behaviors. To address these challenges, we propose Calibrated Interactive RL, a unified framework that couples interactive RL with simulator alignment. By aligning the simulator with human interaction patterns, our approach reduces the sim-to-real gap and mitigates compounding distribution shifts. Experiments across multiple dialogue tasks confirm our theoretical analysis: (i) Interactive RL significantly outperforms the Static Context baseline by mitigating policy distribution shift; and (ii) calibrating simulators with our alignment method further bridges the sim-to-real gap, yielding state-of-the-art downstream performance.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/from-static-context-to-c…

Read original on arxiv.org → arxiv.org/abs/2605.26403

mentioned entities

LLM

Static Context RL

Interactive RL

Calibrated Interactive RL

metadata

slugfrom-static-context-to-calibrated-interactive-rl-mitigating-distribution-shift

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevSejong University launches Asia’…

next →European AI adoption hits 99% wi…

── more in #artificial-intelligence 4 stories · sorted by recency

thinkingmachines.ai · 15 Jul · #artificial-intelligence

Inkling Model Card

pub.towardsai.net · 15 Jul · #artificial-intelligence

Evolution of NLP: TF-IDF to Agents

github.com · 15 Jul · #artificial-intelligence

Sokoban Speedrun for RL

arxiv.org · 15 Jul · #artificial-intelligence

Rzk: A Proof Assistant for Synthetic ∞-Categories

── more on @llm 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required