{"slug": "eca-efficient-continual-alignment-for-open-ended-image-to-text-generation", "title": "ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation", "summary": "Researchers introduced Efficient Continual Alignment (ECA), a new exemplar-free incremental learning approach for open-ended image-to-text generation that adapts alignment modules within pre-trained vision-language models to handle shifting visual data categories over time. ECA employs three mechanisms—Mixture of Query, Fisher Dynamic Expansion, and Dictionary Replay—to acquire new task-specific features while minimizing interference with established knowledge without accessing raw data from previous tasks. The method significantly reduces catastrophic forgetting and improves incremental learning performance on four newly constructed benchmarks designed to reflect real-world environmental shifts.", "body_md": "arXiv:2606.12633v1 Announce Type: new\nAbstract: Incremental Learning (IL) for Open-ended Image-to-Text Generation (OpenITG) enables models to continuously generate accurate, contextually relevant text for new images while preserving previously acquired knowledge. Unlike prior studies, this paper addresses a more practical scenario in which the predominant category of visual data shifts over time as environments evolve. In this context, we introduce a new notion of continual alignment, which incrementally adapts the alignment module within pre-trained VLMs to preserve high-quality cross-modal representations. Based on this idea, we propose Efficient Continual Alignment (ECA), a novel exemplar-free IL approach for OpenITG. The key challenge is enabling the model to acquire new, task-specific features while minimizing interference with the established alignment without accessing raw data from previous tasks. To address this, ECA employs three core mechanisms: a Mixture of Query (MoQ) module that adapts task-specific query tokens, a Fisher Dynamic Expansion (FeDEx) that dynamically expands model structure based on a Fisher Information Matrix (FIM)-based metric, and an embedding dictionary with Dictionary Replay (DR) to retain past knowledge. To evaluate ECA's performance, we construct four new IL OpenITG benchmarks that better reflect real-world scenarios. Experimental results demonstrate that ECA significantly mitigates catastrophic forgetting and improves IL performance compared to baseline methods. Code and benchmarks are available at https://github.com/Snowball0823/ECA.", "url": "https://wpnews.pro/news/eca-efficient-continual-alignment-for-open-ended-image-to-text-generation", "canonical_source": "https://arxiv.org/abs/2606.12633", "published_at": "2026-06-12 04:00:00+00:00", "updated_at": "2026-06-12 04:49:27.701709+00:00", "lang": "en", "topics": ["machine-learning", "computer-vision", "natural-language-processing", "generative-ai", "artificial-intelligence"], "entities": ["ECA", "MoQ", "FeDEx", "Fisher Information Matrix", "OpenITG", "IL", "VLM", "arXiv"], "alternates": {"html": "https://wpnews.pro/news/eca-efficient-continual-alignment-for-open-ended-image-to-text-generation", "markdown": "https://wpnews.pro/news/eca-efficient-continual-alignment-for-open-ended-image-to-text-generation.md", "text": "https://wpnews.pro/news/eca-efficient-continual-alignment-for-open-ended-image-to-text-generation.txt", "jsonld": "https://wpnews.pro/news/eca-efficient-continual-alignment-for-open-ended-image-to-text-generation.jsonld"}}