cd /news/machine-learning/breaking-the-solver-bottleneck-train… · home topics machine-learning article
[ARTICLE · art-32094] src=arxiv.org ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

Researchers introduced PROPEL, a framework that trains task generators for reinforcement learning by using a lightweight activation probe to predict solver pass rates, avoiding costly solver rollouts. In tests on math, code, and software-engineering tasks, PROPEL doubled the proportion of generated tasks at the targeted solve rate for models like Qwen2.5-3B-Instruct and Qwen3.5-27B. This addresses the bottleneck of frontier task supply as AI agents improve.

read1 min views2 publishedJun 18, 2026

arXiv:2606.18284v1 Announce Type: new Abstract: The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve, fixed task distributions saturate, while naive synthetic generation yields tasks that are trivial, impossible, or ill-posed. Training a task generator with RL to optimize validity and learnability can address this bottleneck, but direct optimization requires repeated solver rollouts per candidate. For software-engineering (SWE) tasks, a single rollout can take tens of minutes; solver-in-the-loop generator training is intractable. We introduce PROPEL, a solver-amortized framework for training task generators at the targeted solve rate. PROPEL trains a lightweight activation probe on a one-time labeled corpus of generated tasks and solver outcomes. The probe predicts target-solver pass rate from a frozen generator reference model and serves as a proxy for solve rate during generator optimization, reducing generator evaluation to a single forward pass. Across math, code, and software-engineering at multiple model scales, PROPEL shifts generation toward the targeted solve rate: for coding, tasks generated at the learnable frontier increase from $10.1% \rightarrow 20.0%$ for a Qwen2.5-3B-Instruct solver and from $5.3% \rightarrow 12.6%$ for a Qwen2.5-7B-Instruct solver. For SWE, PROPEL increases the share of generations at the targeted solve rate from $9.8% \rightarrow 19.6%$ for Qwen3.5-27B on repositories not seen during training of probe and generator.

── more in #machine-learning 4 stories · sorted by recency
── more on @propel 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/breaking-the-solver-…] indexed:0 read:1min 2026-06-18 ·