cd /news/artificial-intelligence/lohosearch-benchmarking-long-horizon… · home topics artificial-intelligence article
[ARTICLE · art-24824] src=arxiv.org ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling

Researchers introduced LoHoSearch, a new benchmark of 544 human-verified questions across 11 domains designed to test long-horizon search agents beyond the human difficulty ceiling. The benchmark, built from a knowledge graph of 7 million Wikipedia entities, reduced the strongest model's accuracy to 34.74%, compared to over 90% on saturated benchmarks like BrowseComp. LoHoSearch provides a more demanding standard for evaluating search agents' long-horizon reasoning and context management capabilities.

read1 min publishedJun 12, 2026

arXiv:2606.12837v1 Announce Type: new Abstract: Search agent benchmarks exemplified by BrowseComp have rapidly saturated over the past year, with the strongest models surpassing 90% accuracy. Since these benchmarks are predominantly human-authored, annotators lack a global perspective on entity statistics and cannot systematically maximize search space size and structural complexity. This creates a difficulty ceiling that is hard to break. To address this, we introduce LoHoSearch (Long-Horizon Search Agents), a challenging benchmark comprising 544 human-verified questions across 11 domains. LoHoSearch is constructed via an automated pipeline built upon a knowledge graph covering over 7 million Wikipedia entities, which selects relations with large search spaces and assembles them into structurally complex questions with KG-verified unique answers. Our evaluation demonstrates that even the strongest model achieves only 34.74% accuracy, and existing context management strategies (best +6.8%) yield far smaller gains than on prior benchmarks. LoHoSearch provides a more demanding standard for evaluating long-horizon reasoning and context management in search agents.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/lohosearch-benchmark…] indexed:0 read:1min 2026-06-12 ·