cd /news/machine-learning/pareto-lora-mitigating-modality-imba… · home topics machine-learning article
[ARTICLE · art-30511] src=arxiv.org ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Pareto LoRA: Mitigating Modality Imbalance in Unified Multimodal Models via Pareto-Optimal Gradient Integration

Researchers from multiple institutions introduced Pareto LoRA, a method that mitigates modality imbalance in unified multimodal models by treating instruction tuning as a bi-objective optimization problem. The approach balances text and image gradients, achieving up to 44.9% improvement in perceptual image quality on the CoMM benchmark while maintaining text performance. This addresses the degradation in vision generation quality caused by language gradient dominance during parameter-efficient fine-tuning.

read1 min views2 publishedJun 17, 2026

arXiv:2606.17296v1 Announce Type: new Abstract: Unified multimodal models (UMMs) have recently emerged as a promising paradigm for integrating multimodal understanding and generation within a single autoregressive transformer. However, during multimodal instruction tuning, these models often exhibit pronounced modality imbalance: language gradients dominate optimization, thus leading to lower image generation quality, especially under parameter-efficient fine-tuning such as LoRA. In this work, we systematically analyze modality imbalance in LoRA-based fine-tuning of UMMs for interleaved text-image generation. We show that vision modality performance degrades substantially more than text modality performance when compared to unimodal counterparts, and that modality-specific gradients can differ by orders of magnitude across various tasks and layers. Motivated by this observation, we reformulate the multimodal instruction tuning as a bi-objective optimization problem and propose Pareto LoRA, a Pareto-optimal gradient integration strategy that balances the text and image objectives by modulating the gradient direction and strength. Experiments on the CoMM benchmark with Emu2 demonstrate that Pareto LoRA consistently improves multimodal generation balance, achieving up to 44.9% gains in perceptual image quality over vanilla LoRA while maintaining comparable text performance.

── more in #machine-learning 4 stories · sorted by recency
── more on @pareto lora 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/pareto-lora-mitigati…] indexed:0 read:1min 2026-06-17 ·