Position: RL Researchers Need to Distinguish Between Solving Simulators and Using Simulators as a Proxy

wpnews.pro

cd /news/machine-learning/position-rl-researchers-need-to-dist… · home › topics › machine-learning › article

[ARTICLE · art-44337] src=arxiv.org ↗ pub=2026-06-30T04:00Z topic=machine-learning verified=true sentiment=· neutral

Position: RL Researchers Need to Distinguish Between Solving Simulators and Using Simulators as a Proxy

Reinforcement learning researchers need to clearly distinguish between using simulators as a proxy for real-world deployment and solving simulators as an end in itself, according to a new paper. The authors argue that conflating these two goals leads to misleading conclusions and inappropriate algorithm choices, and call for the community to adopt clearer empirical practices.

read1 min views1 publishedJun 30, 2026

arXiv:2606.28433v1 Announce Type: new Abstract: One goal in reinforcement learning (RL) research is to understand general-purpose sequential decision-making, using benchmark simulators as a proxy for learning in deployment settings. When running experiments, however, the goal of achieving high performance in the simulator can mutate into focusing exclusively on solving the simulator. To achieve high scores, researchers may adopt solutions exclusively meant for solving simulators, rather than learning while the agent is deployed outside a simulator. Solving simulators is also worthy of investigation, but it is a fundamentally different RL research question. In this paper, we argue that RL researchers need to distinguish between two use cases of simulators: solving simulators and using simulators as a proxy for learning in deployment. We first discuss how these two use-cases are importantly different, in terms of constraints on how the agent can use the simulator, which algorithms are appropriate, and which evaluation metrics are appropriate. We then highlight several issues and misleading conclusions that can occur by not making the distinction between these two settings clear, supported with examples and simple experiments. This work is a call to the community to begin clearly distinguishing how they are using simulators in their work, hopefully sparking further discussion on which empirical practices work best in each setting.

source & further reading

arxiv.org — original article

── more in #machine-learning 4 stories · sorted by recency

dev.to · 30 Jun · #machine-learning

Multi-Agent Systems: Building Collaborative AI That Solves Complex Problems

koreaherald.com · 30 Jun · #machine-learning

Lee goes to Gwangju to launch Korea's AI Industrial Revolution

startupfortune.com · 30 Jun · #machine-learning

Companies spending most on AI are hiring faster, not cutting headcount, a billion-job study finds

dev.to · 30 Jun · #machine-learning

I Tried to Design an Entire AI Software Testing Company. Here's the Architecture I'd Actually Build.

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required