04:00
2026-06-12
arxiv.org
ai-agents
Benchmarking AI Agents for Addressing Scientific Challenges Across Scales
Researchers introduced SciAgentArena, a benchmark of approximately 200 tasks with stepwise verification to evaluate AI agents in real-world scientific research scenarios. Testing revealed that currentβ¦