04:00
2026-06-26
arxiv.org
large-language-models
Know2Guess: A Contamination-Aware Multi-Zone Benchmark for Knowledge-Boundary Evaluation in Large Language Models
Researchers introduced Know2Guess, a contamination-aware multi-zone benchmark with 1,200 items across five domains to evaluate large language models' ability to distinguish answerable knowledge from aโฆ