cd /news/large-language-models/exploring-multi-modal-large-language… · home topics large-language-models article
[ARTICLE · art-33507] src=arxiv.org ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Exploring Multi-Modal Large Language Models and Two-Stage Fine-Tuning for Fashion Image Retrieval

Researchers propose a new framework integrating LLaVA and two-stage fine-tuning to improve composed image retrieval in fashion, addressing data scarcity and negative sampling limitations. The method enhances contrastive learning and compositional reasoning, showing improved fine-grained retrieval performance.

read1 min views3 publishedJun 19, 2026

arXiv:2606.19684v1 Announce Type: new Abstract: Composed image retrieval retrieves a target image using a composed query of a reference image and a modified text description. In the fashion domain, this task requires understanding subtle attribute variations such as color, pattern, and texture. However, existing approaches face limitations due to scarce annotated data and simplistic negative sampling. We propose a novel framework that integrates a multi-modal large language model (LLaVA) to generate attribute-aware triplets and introduces a two-stage fine-tuning strategy to enhance contrastive learning. We leverage pretrained vision-language models, such as CLIP-ViT/B32, to generate and concatenate sentence-level prompts with the relative caption and to scale the number of negatives using static representations. Experimental results demonstrate enhanced compositional reasoning and improved fine-grained retrieval behavior, underscoring the feasibility and potential of the proposed framework for fashion retrieval.

── more in #large-language-models 4 stories · sorted by recency
── more on @llava 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/exploring-multi-moda…] indexed:0 read:1min 2026-06-19 ·