Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

wpnews.pro

cd /news/artificial-intelligence/ko-widesearch-a-korean-breadth-searc… · home › topics › artificial-intelligence › article

[ARTICLE · art-42909] src=arxiv.org ↗ pub=2026-06-29T04:00Z topic=artificial-intelligence verified=true sentiment=· neutral

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

Researchers introduced Ko-WideSearch, a Korean breadth-search benchmark for web agents that evaluates exhaustive set enumeration across 228 tables spanning 190 entities. Testing on twenty web agents revealed consistent failures in recovering full rows despite high item-level accuracy, with performance declining as task complexity increased.

read1 min views1 publishedJun 29, 2026

arXiv:2606.27595v1 Announce Type: new Abstract: Web-agent benchmarks overwhelmingly measure depth -- pinning one obscure answer behind a chain of constraints -- while breadth, exhaustively enumerating a closed set and filling each item's attributes, is barely evaluated, especially outside English. Breadth is also hard to build: certifying that a gold set is complete and every cell correct is far costlier than checking a single answer. I introduce \textsc{Ko-WideSearch}, a Korean breadth-search benchmark built by an automated synthesize-and-verify pipeline. Each task names a set-parent entity -- a TV season, a dynasty, a league, an administrative region, an election -- and asks for its full membership plus a per-item attribute table, graded by Item-, Column-, and Row-F1. It spans 228 tables over 190 entities and sixteen categories across three difficulty tiers, set by two structural knobs I dial independently -- table width and a 2-D composite key -- so cross-product membership climbs from 0% to 100% across the tiers. A single normalization-aware comparator is shared between gold construction and grading, so stable date and count columns are not over-dropped on formatting alone. Across twenty web agents, the failure is consistent: agents recover the set but not the rows (e.g.\ Item-F1 92.8 against Row-F1 53.7), accuracy falls steadily as the knobs harden, and neither more search nor more spend closes the gap. Broken down by cell, the hard part is finding the right value, not formatting it: open-ended free-text cells fail most, while cells with a standard answer such as a date or a name usually come out right.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/ko-widesearch-a-korean-b…

Read original on arxiv.org → arxiv.org/abs/2606.27595

mentioned entities

Ko-WideSearch

arXiv

metadata

slugko-widesearch-a-korean-breadth-search-benchmark-for-exhaustive-set-enumeration

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevStäll in kurslarmet – gör dig re…

next →Most teams accept higher risk fo…

── more in #artificial-intelligence 4 stories · sorted by recency

arxiv.org · 29 Jun · #artificial-intelligence

DMV-Bench: Diagnosing Long-Horizon Multimodal Agents' Visual Memory with Incidental Cue Injection

arxiv.org · 29 Jun · #artificial-intelligence

Prism Transformer: Progressive Head Schedules for Hierarchical Attention Processing

arxiv.org · 29 Jun · #artificial-intelligence

When Search Agents Should Ask: DiscoBench for Clarification-Aware Deep Search

pub.towardsai.net · 29 Jun · #artificial-intelligence

Your LLM Lies Confidently. I Built an Engine That Doesn’t.

── more on @ko-widesearch 3 stories trending now

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

wpnews · 5 Jun · #ai-agents

Miasma Worm Targets AI Coding Agents via GitHub Repos

wpnews · 28 Jun · #ai-agents

OpenCode v1.17: Session Snapshots Undo Your AI Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required