LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

wpnews.pro

cd /news/generative-ai/longav-compass-towards-unified-evalu… · home › topics › generative-ai › article

[ARTICLE · art-14867] src=arxiv.org ↗ pub=2026-05-27T04:00Z topic=generative-ai verified=true sentiment=· neutral

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

Researchers introduced LongAV-Compass, a benchmark for evaluating minute-long audio-visual generation across text, image, and video conditioning modalities. The benchmark contains 284 test cases and assesses over 20 fine-grained dimensions, including identity consistency, narrative coherence, and audio-visual alignment over extended time horizons. Experiments on 11 models revealed limitations in current systems' ability to sustain coherent and semantically aligned generation at minute scale.

read1 min views9 publishedMay 27, 2026

arXiv:2605.26244v1 Announce Type: new Abstract: Audio-visual generation is rapidly advancing from short clips to minute-long content, while existing evaluation protocols remain largely confined to short-form settings. Existing benchmarks primarily focus on 5--10 second text-conditioned generation and rarely support unified evaluation across text, image, and video conditioning modalities. Moreover, they provide limited insight into how identity consistency, narrative coherence, and audio-visual alignment degrade over extended temporal horizons. To bridge this gap, we introduce LongAV-Compass, a systematic benchmark for minute-long audio-visual generation. LongAV-Compass contains 284 curated test cases spanning text-to-audio-video (T2AV), image-to-audio-video (I2AV), and video-to-audio-video (V2AV), organized by application scenario and generation complexity. The benchmark combines taxonomy-guided benchmark construction with a unified evaluation framework that integrates MLLM-assisted assessment with complementary perceptual and multimodal metrics, including DINO-v2, ArcFace, CLIP, and ImageBind. The framework evaluates more than 20 fine-grained dimensions covering within-segment quality, cross-segment consistency, global narrative coherence, semantic alignment, and audio-visual synchronization. Through experiments on 11 representative models together with human-alignment validation, LongAV-Compass provides a diagnostic testbed for analyzing the limitations of current systems in sustaining coherent, semantically aligned, and temporally consistent minute-scale audio-visual generation across diverse input modalities.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/longav-compass-towards-u…

Read original on arxiv.org → arxiv.org/abs/2605.26244

mentioned entities

LongAV-Compass

DINO-v2

ArcFace

CLIP

ImageBind

metadata

sluglongav-compass-towards-unified-evaluation-of-minute-scale-audio-visual-across

topic#generative-ai

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevSejong University launches Asia’…

next →European AI adoption hits 99% wi…

── more in #generative-ai 4 stories · sorted by recency

machinebrief.com · 14 Jul · #generative-ai

DUNE: The major shift for Diffusion Models

machinebrief.com · 14 Jul · #generative-ai

IG-GAN: Redefining Aerodynamic Data with Intrinsic Geometry

blog.roboflow.com · 14 Jul · #generative-ai

Segment Anything with Text

androidauthority.com · 14 Jul · #generative-ai

Google gets its biggest visual search update in years — here’s what’s changed

── more on @longav-compass 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required