04:00
2026-06-15
arxiv.org
ai-agents
WorkBench Revisited: Workplace Agents Two Years On
A new arXiv paper revisits the WorkBench benchmark for workplace agents, finding that the best agent in June 2026, Claude Opus 4.8, completes 89% of tasks with only 2.5% unintended harmful actions, upโฆ