04:00
2026-06-12
arxiv.org
artificial-intelligence
LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling
Researchers introduced LoHoSearch, a new benchmark of 544 human-verified questions across 11 domains designed to test long-horizon search agents beyond the human difficulty ceiling. The benchmark, buiโฆ