04:00
2026-05-27
arxiv.org
ai-agents
JobBench: Aligning Agent Work With Human Will
Researchers introduced JobBench, a new benchmark that evaluates AI agents on 130 tasks across 35 occupations based on workflows experts prioritize for delegation. The benchmark assesses models using fโฆ