MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation

Researchers propose MODE-RAG, a multi-agent system using Variational Free Energy and attention states to dynamically gate interventions in multimodal retrieval-augmented generation, reducing hallucinations and logical fabrications. The system routes high-risk queries to five agents integrating Monte Carlo Tree Search and logit perturbations, and introduces the ModeVent evaluation subset from MultiVent. Experiments show improved robustness in M-RAG systems.

arXiv:2606.17449v1 Announce Type: new Abstract: While Multimodal Retrieval-Augmented Generation M-RAG enhances Large Vision-Language Models, it remains highly susceptible to cross-modal hallucinations, causal fabrications, and sycophancy. Furthermore, existing mitigation pipelines often face an intervention paradox: static rules tend to unnecessarily disrupt accurate generations, whereas leaving the multi-modal reasoning completely unguided allows existing mismatches to cascade into severe logical fabrications. To quantify and mitigate these hallucinations, we propose a Multi-Agent system, MODE-RAG, driven by Variational Free Energy VFE and internal attention states to dynamically gate interventions. High-risk queries are routed to five stage-specific agents, integrating Monte Carlo Tree Search MCTS for rigorous causal derivation and logit perturbations to penalize sycophancy. Dedicated Correction and Overseer agents ensure formatting stability and perform post-hoc factual verification. To objectively evaluate our approach, we introduce ModeVent, a challenging subset derived from the MultiVent dataset. Extensive experiments indicate that our system effectively reduces hallucination rates and logical fabrication, significantly improving the robustness of M-RAG systems.