cd /news/ai-agents/can-generalist-agents-automate-data-… · home topics ai-agents article
[ARTICLE · art-21099] src=arxiv.org pub= topic=ai-agents verified=true sentiment=· neutral

Can Generalist Agents Automate Data Curation?

Researchers introduced Curation-Bench, a benchmark testing whether generalist coding agents can automate the labor-intensive process of curating AI training data. Out-of-the-box agents matched strong published data-selection baselines within ten iterations, but trajectory analysis showed they primarily tuned local policy variants rather than exploring new methods. A scaffolded agent that cited, instantiated, and adapted prior methods autonomously composed a data-selection policy outperforming published baselines at one-tenth the data budget, demonstrating that reliable data research requires structured method adaptation.

read1 min publishedJun 4, 2026

arXiv:2606.04261v1 Announce Type: new Abstract: Curating training data is among the most consequential yet labor-intensive parts of modern AI development: practitioners iteratively propose, implement, evaluate, and revise data policies against noisy benchmark feedback. We ask whether generalist coding agents can automate this data-curation loop. We introduce Curation-Bench, an agent-centric benchmark that fixes the model, training recipe, and evaluation suite while giving agents command-line access to inspect data, implement policies, submit them to a fixed training/evaluation pipeline, and revise. In a vision-language instruction-tuning instantiation, out-of-the-box agents reach strong published data-selection baselines within ten iterations. However, trajectory analysis reveals a persistent execution-research gap: agents mainly tune local policy variants rather than explore new policy families, even when given strategy guides and paper references. Scaffolds requiring each iteration to cite, instantiate, and adapt a prior method shift agents toward method-guided exploration. The scaffolded agent autonomously composes -- without human design input -- a data-selection policy that outperforms strong published baselines at one-tenth their data budget. Overall, current agents can run the curation loop, but reliable data research requires scaffolded method adaptation, not open-ended prompting alone. Code and benchmark are open-sourced.

── more in #ai-agents 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/can-generalist-agent…] indexed:0 read:1min 2026-06-04 ·