{"slug": "a-35-billion-parameter-agent-that-punches-like-a-trillion-parameter-model", "title": "A 35-billion-parameter agent that punches like a trillion-parameter model", "summary": "Shanghai AI Lab has developed Agents-A1, a 35-billion-parameter model that matches trillion-parameter models on multi-step agent tasks. Instead of scaling parameter count, the researchers scaled the training horizon—the length and variety of action sequences—enabling the small model to sustain plans across long sequences of tool use. The work suggests that better, longer training data can be a viable path to capable agents, challenging the focus on ever-larger models.", "body_md": "A 35-billion-parameter model called Agents-A1 matches trillion-parameter models on multi-step agent tasks, according to a new paper from Shanghai AI Lab. The key insight: instead of scaling parameter count, the researchers scaled the \"horizon\" — the length and variety of action sequences the model trains on — producing a small model that sustains plans across long sequences of tool use as well as giants do. The work is on [arXiv](https://arxiv.org/abs/2606.30616), and its title captures the thesis: scaling the horizon, not the parameters.\n\nAgents-A1 has 35 billion total parameters — small by frontier standards — yet matches trillion-parameter models on agent tasks: long, multi-step jobs where the AI must use tools, take actions, observe results, and keep working toward a goal across many turns. A giant model has vast raw knowledge, but agent work depends less on knowing more facts and more on sustaining a plan across a long sequence of actions without losing the thread. So instead of scaling parameter count, the researchers scaled the horizon — the length and variety of the action sequences the model learns from.\n\nConcretely, they built an infrastructure that connects external knowledge, actions, observations, and checks on whether each action worked, and used it to generate training examples that average around forty-five thousand words per task. The model learns from full, extended episodes of real problem-solving, not short snippets. Training on long trajectories teaches the specific skill agents need: carrying context and a goal across dozens of steps, the difference between studying finished essays and watching someone work through an entire project from start to finish.\n\nThe training structure leans on distillation, an idea we cover in [distillation](https://groundtruth.day/news//learn/distillation.html). Rather than making one model good at everything at once, the team first trained separate specialist teacher models, each expert in one domain, then distilled all of them into a single student model — routing the student to learn from whichever teacher was most relevant for a given kind of task. This lets one modestly sized model absorb the strengths of several specialists. It is also built as a mixture-of-experts model, so only part of it activates at any moment, keeping running costs down; our lesson on [mixture of experts](https://groundtruth.day/news//learn/mixture-of-experts.html) explains why that design is everywhere now.\n\nThe reported results are strong across a spread of demanding agent and science benchmarks — the paper claims leading or highly competitive numbers on tasks involving tool use, web browsing, and scientific reasoning, holding its own against trillion-parameter systems on the long-horizon work it was built for. If that holds up under independent testing, the implication is meaningful: the path to capable agents runs partly through better, longer training data rather than only through ever-larger and more expensive models — good news for anyone who cannot afford to train a trillion-parameter system.\n\nThe honest caveat is the standard one for a self-reported paper: these are the authors' own benchmark numbers, and benchmark performance and real-world reliability are not the same thing — a point our lesson on [how AI gets benchmarked](https://groundtruth.day/news//learn/how-ai-is-benchmarked.html) makes at length. Matching a giant model on a curated test set is impressive but does not guarantee matching it on the messy, open-ended tasks people actually throw at agents, where, as a wave of new benchmarks this week showed, even the best frontier models still struggle badly. There is also a selection effect: it is easier to reach parity on the exact kinds of tasks you designed your training data around. Still, the core argument is a healthy corrective to size-worship. Bigger is one way to get better, but it is not the only way — and for the specific challenge of agents that have to think across a long stretch of work, teaching a smaller model on longer examples may be the smarter bet.\n\n*Originally published on Ground Truth, where every claim is checked against the primary source.*", "url": "https://wpnews.pro/news/a-35-billion-parameter-agent-that-punches-like-a-trillion-parameter-model", "canonical_source": "https://dev.to/breachprotocol/a-35-billion-parameter-agent-that-punches-like-a-trillion-parameter-model-ib7", "published_at": "2026-07-01 16:05:23+00:00", "updated_at": "2026-07-01 16:19:12.230856+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-research", "machine-learning"], "entities": ["Shanghai AI Lab", "Agents-A1", "arXiv"], "alternates": {"html": "https://wpnews.pro/news/a-35-billion-parameter-agent-that-punches-like-a-trillion-parameter-model", "markdown": "https://wpnews.pro/news/a-35-billion-parameter-agent-that-punches-like-a-trillion-parameter-model.md", "text": "https://wpnews.pro/news/a-35-billion-parameter-agent-that-punches-like-a-trillion-parameter-model.txt", "jsonld": "https://wpnews.pro/news/a-35-billion-parameter-agent-that-punches-like-a-trillion-parameter-model.jsonld"}}