04:00
2026-06-04
arxiv.org
computer-vision
FindIt: A Format-Informed Visual Detection Benchmark for Generalist Multimodal LLMs
Researchers have introduced FindIt, the first comprehensive benchmark designed to evaluate the promptable localization abilities of generalist multimodal large language models (MLLMs) across object deβ¦