cd /news/artificial-intelligence/grip-feedback-guided-prompt-retrieva… · home topics artificial-intelligence article
[ARTICLE · art-24783] src=arxiv.org ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

GRIP: Feedback-Guided Prompt Retrieval for Large Multimodal Models

Researchers introduced GRIP, a feedback-guided retrieval framework for multimodal in-context learning that selects examples based on their actual impact on model predictions rather than visual similarity. The system improved performance across classification, captioning, and visual question answering tasks on multiple large multimodal models. GRIP's retrievers can transfer between different models without retraining, including closed-source systems like GPT-4o and Gemini, enabling more efficient deployment of in-context learning.

read1 min publishedJun 12, 2026

arXiv:2606.12744v1 Announce Type: new Abstract: In-Context Learning (ICL) has become a powerful mechanism for adapting Large Language Models (LLMs) to new tasks without fine-tuning. Extending this concept to Large Multimodal Models (LMMs), Multimodal In-Context Learning (M-ICL) relies on retrieving relevant examples, such as images, captions, or question-answer pairs, to guide predictions across tasks like classification, captioning, and visual question answering (VQA). Most existing approaches select in-context examples based on feature-space similarity, assuming that semantically similar samples provide the most useful context. However, our systematic analysis reveals that this assumption does not always hold: visually similar examples are not necessarily those that most effectively enhance in-context learning performance. To address this, we propose the Guided Retrieval of In-context Prompts (GRIP), a learnable vision-only retrieval framework that leverages feedback from LMMs to identify examples that truly improve model predictions. GRIP learns to distinguish beneficial from detrimental in-context examples through contrastive training, refining retrieval beyond pure similarity. Across three multimodal tasks, namely classification, captioning, and VQA, GRIP improves consistently over similarity-based retrieval on Qwen2.5-VL-7B, with its strongest gains in classification on Idefics2-8B. Moreover, we demonstrate that retrievers trained with feedback from one open LMM can be transferred to other models without retraining, including closed-source GPT-4o and Gemini, enabling scalable and cost-efficient deployment of M-ICL. Code will be published upon acceptance.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/grip-feedback-guided…] indexed:0 read:1min 2026-06-12 ·