cd /news/large-language-models/listening-makes-vision-clear-for-vlm… · home topics large-language-models article
[ARTICLE · art-37202] src=arxiv.org ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Listening makes Vision Clear for VLMs

Researchers propose Prompt-Vision Token Activation Map (PV-TAM) to improve vision-language consistency in large vision-language models by using prompt-side semantics and filtering systematic bias from modality boundary markers, outperforming answer-side baselines on localization metrics.

read1 min views5 publishedJun 24, 2026

arXiv:2606.23763v1 Announce Type: new Abstract: Recent work typically assesses vision--language consistency using attention distributions of answer-side tokens. However, we observe that highest attention regions are not always consistent with the intended semantic token. This probably stems from decoding drift, where language priors from previously generated answer tokens accumulate and mismatch with visual attention. Besides the priors from previous answer tokens, we find that structural tokens, e.g., modality boundary markers, may encompass the entire context and generate high attention to areas unrelated to the target. To avoid these distortions and provide consistency evaluation for large VLMs, we adopt prompt-side semantics and propose Prompt-Vision Token Activation Map (PV-TAM). PV-TAM further incorporates a filter to remove systematic bias induced by modality boundary markers. Unlike traditional methods that evaluate overlap solely through masks while ignoring activation intensity, our metrics leverage the peak distribution of attention to measure the alignment between prompts and visual regions. In experiments, PV-TAM consistently improves both attention-based and IoU-style localization metrics over answer-side baselines on various datasets.

── more in #large-language-models 4 stories · sorted by recency
── more on @pv-tam 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/listening-makes-visi…] indexed:0 read:1min 2026-06-24 ·