BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

wpnews.pro

cd /news/artificial-intelligence/balcaprl-a-balanced-framework-for-rl… · home › topics › artificial-intelligence › article

[ARTICLE · art-17310] src=machinelearning.apple.com ↗ pub=2026-05-11T00:00Z topic=artificial-intelligence verified=true sentiment=· neutral

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

Researchers at BalCapRL introduced a balanced reinforcement learning framework for multimodal large language model image captioning that jointly optimizes utility-aware correctness, reference coverage, and linguistic quality. The method applies GDPO-style reward-decoupled normalization and length-conditional reward masking to address trade-offs in existing captioning-RL approaches. Across LLaVA-1.5-7B and Qwen2.5-VL models, BalCapRL achieved peak gains of +13.6 DCScore, +9.0 CaptionQA, and +29.0 CapArena, improving caption quality without sacrificing fluency or usefulness.

read2 min views9 publishedMay 11, 2026

content type paperpublished May 2026 BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

AuthorsShaokai Ye, Vasileios Saveris, Yihao Qian, Jiaming Hu, Elmira Amirloo, Peter Grasch

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

AuthorsShaokai Ye, Vasileios Saveris, Yihao Qian, Jiaming Hu, Elmira Amirloo, Peter Grasch

Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinforcement learning (RL). However, existing captioning-RL methods and evaluation metrics often emphasize a narrow notion of caption quality, inducing trade-offs across core dimensions of captioning. For example, utility-oriented objectives can encourage noisy, hallucinated, or overlong captions that improve downstream question answering while harming fluency, whereas arena-style objectives can favor fluent but generic descriptions with limited usefulness. To address this, we propose a more balanced RL framework that jointly optimizes utility-aware correctness, reference coverage, and linguistic quality. In order to effectively optimize the resulting continuous multi-objective reward formulation, we apply GDPO-style reward-decoupled normalization to continuous-valued captioning rewards and show that it improves performance over vanilla GRPO. Additionally, we introduce length-conditional reward masking, yielding a more suitable length penalty for captioning. Across LLaVA-1.5-7B and Qwen2.5-VL 3B and 7B base models, our method consistently improves caption quality, with peak gains of +13.6 DCScore, +9.0 CaptionQA, and +29.0 CapArena across different models.

RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning

March 16, 2026research area Computer Vision, research area Data Science and Annotation Dense image captioning is critical for cross-modal alignment in vision-language pretraining and text-to-image generation, but scaling expert-quality annotations is prohibitively expensive. While synthetic captioning via strong vision-language models (VLMs) is a practical alternative, supervised distillation often yields limited output diversity and weak generalization. Reinforcement learning (RL) could overcome these limitations, but its…

Revisit Large-Scale Image–Caption Data in Pre-training Multimodal Foundation Models

April 8, 2025research area Computer Vision, research area Methods and Algorithms conference ICLR

Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. Notably, the role of synthetic captions and their interaction with original web-crawled AltTexts in pre-training is still unclear. Additionally, different multimodal foundation models may have distinct preferences for specific caption formats while the efforts of studying the optimal captions for each foundation…

source & further reading

machinelearning.apple.com — original article Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants Behavioral Privacy Leakage in Agentic Negotiation: Formalizing and Mitigating Inference Attacks via Randomized Policies Incentivizing Temporal-Awareness in Egocentric Video Understanding Models

~/api · this article 200

$curl api.wpnews.pro/v1/news/balcaprl-a-balanced-fram…

Read original on machinelearning.apple.com → machinelearning.apple.com/research/balcaprl-mllm…

mentioned entities

Shaokai Ye

Vasileios Saveris

Yihao Qian

Jiaming Hu

Elmira Amirloo

Peter Grasch

metadata

slugbalcaprl-a-balanced-framework-for-rl-based-mllm-image-captioning

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalmachinelearning.apple.com

navigation

← prev‘Your Career Starts at the Begin…

next →Why More Context Can Make an LLM…

── more in #artificial-intelligence 4 stories · sorted by recency

macrumors.com · 14 Jul · #artificial-intelligence

macOS Golden Gate Public Beta: 10 Features to Try First

dev.to · 14 Jul · #artificial-intelligence

Build a Voice Assistant with Python and Whisper

dev.to · 14 Jul · #artificial-intelligence

What Is Turkish-Language AI? Tokenizers, Training Data, and Language Model Development

dev.to · 14 Jul · #artificial-intelligence

What Is GPT? A Practical Guide to Tokens, Transformers, Training, and Fine-Tuning

── more on @shaokai ye 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required