FML-Bench: A Controlled Study of AI Research Agent Strategies
Researchers introduced FML-Bench, a benchmark of 18 machine learning research tasks across 10 domains, to isolate the impact of agent strategy from execution infrastructure on AI research agent performance. Testing six a…