Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

wpnews.pro

cd /news/large-language-models/internalizing-the-future-a-unified-a… · home › topics › large-language-models › article

[ARTICLE · art-42935] src=arxiv.org ↗ pub=2026-06-29T04:00Z topic=large-language-models verified=true sentiment=· neutral

Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

Researchers propose a three-stage training paradigm to internalize future-aware planning in LLM agents, enabling them to simulate outcomes before acting. The approach, evaluated on search and math reasoning tasks, outperforms baselines by bridging a format-capability gap in world model learning.

read1 min views1 publishedJun 29, 2026

arXiv:2606.27483v1 Announce Type: new Abstract: Large language model (LLM) agents have demonstrated strong capability in sequential decision-making, yet they remains fundamentally reactive in long-horizon tasks. Unlike humans who employ "what-if" reasoning to evaluate potential plans before commitment, standard agents lack an internal world model to simulate future outcomes. Therefore, we propose to internalize future-aware planning by training a single autoregressive model to verbalize both a prospective state rollout and a plan-conditioned success estimate-a textual analogue of the Q-value. Crucially, we identify a format-capability gap: simply fine-tuning agents on look-ahead traces during post-training leads to superficial mimicry of foresight without genuine predictive grounding. To bridge this gap, we introduce a three-stage training paradigm: (i) World Model Agentic Mid-Training (WM-AMT) to inject latent predictive capabilities into the policy; (ii) Format-Eliciting SFT (FE-SFT) to structure this injected capability; and (iii) Foresight-Conditioned Reinforcement Learning (FC-RL) to refine the calibration and utility of the generated simulations. Evaluated on search and mathematical reasoning tasks, our approach consistently outperforms other training baselines. Our results demonstrate that effective internal world modeling in LLM agents requires a capability-first training pipeline to achieve grounded and calibrated foresight.

source & further reading

arxiv.org — original article

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 29 Jun · #large-language-models

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

arxiv.org · 29 Jun · #large-language-models

Masked Language Flow Models

arxiv.org · 29 Jun · #large-language-models

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

arxiv.org · 29 Jun · #large-language-models

Prism Transformer: Progressive Head Schedules for Hierarchical Attention Processing

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required