Vision-driven Preference Synthesis for Mitigating Hallucinations in VLMs

wpnews.pro

cd /news/computer-vision/vision-driven-preference-synthesis-f… · home › topics › computer-vision › article

[ARTICLE · art-44316] src=arxiv.org ↗ pub=2026-06-30T04:00Z topic=computer-vision verified=true sentiment=↑ positive

Vision-driven Preference Synthesis for Mitigating Hallucinations in VLMs

Researchers propose ViPSy, a framework for constructing preference data that reduces hallucinations in Vision-Language Models (VLMs) by leveraging visual cues from semantically aligned image variants. The method achieves state-of-the-art hallucination mitigation, reducing rates on AMBER and Object HalBench by 35.7% and 24.5%, respectively, while improving performance on general visual grounding benchmarks.

read1 min views1 publishedJun 30, 2026

arXiv:2606.28401v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have shown strong performance in visual understanding, yet they still suffer from hallucinations, generating content that is not grounded in the image. Preference alignment is a promising approach to improve visual faithfulness, but its success depends heavily on how preference pairs are constructed. Existing methods exhibit two key limitations; (a) intervention-based methods often introduce significant deviation from the policy distribution, and (b) sampling-based methods often underuse visual information during the construction. In this paper, we propose ViPSy (Vision-driven Preference Synthesis), a framework for constructing preference data that are both policy-aligned and visually grounded. Our framework consists of two stages; in the first stage, ViPSy derives a visual cue from recurring object-level content across semantically aligned image variants, so preference construction can rely on visual information rather than language priors. In the second stage, ViPSy conditions the policy's own rollouts on this cue, allowing candidates to be guided by visually grounded content while staying close to the policy's response distribution. The resulting candidates remain close to the policy's response distribution while better leveraging visual information from the image. Experiments show that the resulting VLM, preference-aligned with ViPSy-constructed preference pairs, achieves a new state-of-the-art in hallucination mitigation. Compared with the previous state-of-the-art method, it reduces hallucination rates on AMBER and Object HalBench by 35.7% and 24.5%, respectively. The resulting model further improves on general visual grounding benchmarks, e.g., MMStar, MMVP, and CV-Bench, while also yielding gains in semantic segmentation and ImageNet linear probing, underscoring the effectiveness of our framework in enhancing the model's visual capabilities.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/vision-driven-preference…

Read original on arxiv.org → arxiv.org/abs/2606.28401

mentioned entities

ViPSy

AMBER

Object HalBench

MMStar

MMVP

CV-Bench

ImageNet

metadata

slugvision-driven-preference-synthesis-for-mitigating-hallucinations-in-vlms

topic#computer-vision

secondary2 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevShow HN: We made an Audio ML sha…

next →X rolls out hosted MCP server fo…

── more in #computer-vision 4 stories · sorted by recency

arxiv.org · 30 Jun · #computer-vision

RADIANT-PET: Reasoning-Augmented PET/CT Lesion Segmentation with Large Language Models and Reinforcement Learning

arxiv.org · 30 Jun · #computer-vision

GeoISF: Instance Semantic Forest Inspired Large-Scale Cross-View Geo-Localization via Ground LiDAR-to-Satellite Image

arxiv.org · 30 Jun · #computer-vision

Few-class Fidelity: Evaluating Explanations of Real-conditions CNN classifiers with Optimized Perturbations

arxiv.org · 30 Jun · #computer-vision

GPU-Accelerated Inverse Structural Anastylosis from Block Collapse Dynamics

── more on @vipsy 3 stories trending now

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 29 Jun · #ai-agents

I built 25 executable skills for AI coding agents �“ all open source

wpnews · 29 Jun · #large-language-models

The Silent Cost of AI Agents: Why Your Next.js SaaS Is Burning Money on LLM Calls

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required