VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models

wpnews.pro

cd /news/artificial-intelligence/vsas-bench-real-time-evaluation-of-v… · home › topics › artificial-intelligence › article

[ARTICLE · art-17309] src=machinelearning.apple.com ↗ pub=2026-05-22T00:00Z topic=artificial-intelligence verified=true sentiment=· neutral

VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models

Researchers have introduced VSAS-Bench, a new benchmark for evaluating visual streaming assistant models that process continuous video frames in real time. The framework provides over 18,000 temporally dense annotations and standardized protocols to measure accuracy, latency, proactiveness, and consistency. Large-scale evaluations revealed that conventional vision-language models adapted to streaming settings without additional training outperformed specialized streaming models, with Qwen3-VL-4B surpassing the leading streaming VLM by 3% under the asynchronous protocol.

read2 min views12 publishedMay 22, 2026

content type paperpublished May 2026 VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models

AuthorsPavan Kumar Anasosalu Vasu*, Cem Koc*, Fartash Faghri*, Chun-Liang Li, Bo Feng, Zhengfeng Lai, Meng Cao, Oncel Tuzel, Hadi Pouransari*

VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models

AuthorsPavan Kumar Anasosalu Vasu*, Cem Koc*, Fartash Faghri*, Chun-Liang Li, Bo Feng, Zhengfeng Lai, Meng Cao, Oncel Tuzel, Hadi Pouransari*

Streaming vision-language models (VLMs) continuously generate responses given an instruction prompt and an online stream of input frames. This is a core mechanism for real-time visual assistants. Existing VLM frameworks predominantly assess models in offline settings. In contrast, the performance of a streaming VLM depends on additional metrics beyond pure video understanding, including proactiveness, which reflects the timeliness of the model’s responses, and consistency, which captures the robustness of its responses over time. To address this limitation, we propose VSAS-Bench, a new framework and benchmark for Visual Streaming Assistants. In contrast to prior benchmarks that primarily employ single-turn question answering on video inputs, VSAS-Bench features temporally dense annotations with over 18,000 annotations across diverse input domains and task types. We introduce standardized synchronous and asynchronous evaluation protocols, along with metrics that isolate and measure distinct capabilities of streaming VLMs. Using this framework, we conduct large-scale evaluations of recent video and streaming VLMs, analyzing the accuracy–latency trade-off under key design factors such as memory buffer length, memory access policy, and input resolution, yielding several practical insights. Finally, we show empirically that conventional VLMs can be adapted to streaming settings without additional training, and demonstrate that these adapted models outperform recent streaming VLMs. For example, Qwen3-VL-4B surpasses Dispider, the best streaming VLM on our benchmark by 3% under asynchronous protocol.

FastVLM: Efficient Vision Encoding for Vision Language Models

July 23, 2025research area Computer Vision Vision Language Models (VLMs) enable visual understanding alongside textual inputs. They are typically built by passing visual tokens from a pretrained vision encoder to a pretrained Large Language Model (LLM) through a projection layer. By leveraging the rich visual representations of the vision encoder and the world knowledge and reasoning capabilities of the LLM, VLMs can be useful for a wide range of applications, including accessibility…

How Far Are We from Intelligent Visual Deductive Reasoning?

May 1, 2024research area Computer Vision, research area Speech and Natural Language Processing How Far Are We from AGI?

This paper was accepted at the How Far Are We from AGI? workshop at ICLR 2024.

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks. We dig into vision-based deductive reasoning, a more sophisticated but less explored realm, and find previously unexposed blindspots in the current SOTA VLMs. Specifically, we leverage Raven’s Progressive Matrices (RPMs), to assess VLMs’…

source & further reading

machinelearning.apple.com — original article Behavioral Privacy Leakage in Agentic Negotiation: Formalizing and Mitigating Inference Attacks via Randomized Policies Incentivizing Temporal-Awareness in Egocentric Video Understanding Models Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context

~/api · this article 200

$curl api.wpnews.pro/v1/news/vsas-bench-real-time-eva…

Read original on machinelearning.apple.com → machinelearning.apple.com/research/vsas-bench-st…

mentioned entities

Pavan Kumar Anasosalu Vasu

Cem Koc

Fartash Faghri

Chun-Liang Li

Bo Feng

Zhengfeng Lai

Meng Cao

Oncel Tuzel

metadata

slugvsas-bench-real-time-evaluation-of-visual-streaming-assistant-models

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalmachinelearning.apple.com

navigation

← prevOutput-Gate at 89 Hours: What an…

next →Brain scans reveal how a teenage…

── more in #artificial-intelligence 4 stories · sorted by recency

machinebrief.com · 14 Jul · #artificial-intelligence

Nested-ReFT: A Leap Forward in Efficient AI Training

machinebrief.com · 14 Jul · #artificial-intelligence

Anchoring AI: Enhancing Autonomous Aerial Perception

machinebrief.com · 14 Jul · #artificial-intelligence

Test-Driven Development Meets AI: The TENET Framework Revolutionizes Code Generation

machinebrief.com · 14 Jul · #artificial-intelligence

Anthropic's Claude Fable 5: The Reluctant Genius in Biomedical AI

── more on @pavan kumar anasosalu vasu 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required