Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification

wpnews.pro

cd /news/large-language-models/ground-then-rank-revisiting-knowledg… · home › topics › large-language-models › article

[ARTICLE · art-37187] src=arxiv.org ↗ pub=2026-06-24T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification

Researchers propose a training-free IBA framework for Knowledge-Based VQA that decouples entity identification from evidence re-ranking, outperforming fine-tuned baselines on Encyclopedic-VQA and InfoSeek while reducing complexity.

read1 min views6 publishedJun 24, 2026

arXiv:2606.23881v1 Announce Type: new Abstract: Knowledge-Based Visual Question Answering (KB-VQA) requires grounding visual queries to external knowledge beyond directly observable content in images. While recent multi modal large language models (MLLMs) show strong perceptual abilities, they struggle on KB-VQA tasks requiring groundings from both fine-grained entity and evidence levels. Most existing multi-modal retrieval augmented generation (MM-RAG) methods tightly couple entity discrimination and section-level evidence ranking into a single re-ranking stage, leading to high cost and limited generalization. In this work, we revisit existing MM-RAG solutions from a workflow perspective and argue both entity-level and fact-level groundings are key bottlenecks. We observe that although MLLMs often fail under open-ended entity naming, they can better identify the correct entity when selecting from a small set of candidate names. Based on this insight, we propose a simple and training-free identify-before-answer IBA framework that decouples entity identification from section-level re-ranking. Our approach prompts an MLLM to select high-confidence entities using only candidate names, followed by an off-the-shelf textual re-ranker for evidence selection. Experiments on Encyclopedic-VQA and InfoSeek show that our method consistently outperforms fine-tuned multi-modal re-ranking baselines while reducing training and inference complexity. Additional analyses reveal that the improvements arise not only from better entity identification, but also from selecting more informative evidence once correct entity is fixed. Our implementation is made public to ease reproducibility.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/ground-then-rank-revisit…

Read original on arxiv.org → arxiv.org/abs/2606.23881

mentioned entities

arXiv

Encyclopedic-VQA

InfoSeek

IBA

metadata

slugground-then-rank-revisiting-knowledge-based-vqa-with-training-free-entity

topic#large-language-models

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevStop coding agents from writing …

next →Zhipu considers multibillion-dol…

── more in #large-language-models 4 stories · sorted by recency

letsdatascience.com · 25 Jun · #large-language-models

Essay Argues LLM Conversations Impose Social Exhaustion

letsdatascience.com · 25 Jun · #large-language-models

Dell Introduces PowerEdge XE8812 for Vera Rubin NVL4

eetimes.com · 25 Jun · #large-language-models

OpenAI’s Jalapeño Will Be Spicy, But the Real Sizzle Is Its Chip Design AI

pub.towardsai.net · 25 Jun · #large-language-models

Substrate-Bound Coupling in Human-LLM Interaction

── more on @arxiv 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required