Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

wpnews.pro

cd /news/large-language-models/not-truly-multilingual-script-consis… · home › topics › large-language-models › article

[ARTICLE · art-30504] src=arxiv.org ↗ pub=2026-06-17T04:00Z topic=large-language-models verified=true sentiment=↓ negative

Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation

Researchers introduced PuMVR, a benchmark of 1,000 image-text pairs across Punjabi's three scripts, and found that state-of-the-art vision-language models show a systematic Script Gap, with accuracy differences up to 16% and script consistency rates as low as 24.8%. The findings reveal that current multilingual VLMs are not truly multi-script, highlighting the need for script-agnostic evaluation to ensure equitable AI access.

read1 min views1 publishedJun 17, 2026

arXiv:2606.17188v1 Announce Type: new Abstract: Current multilingual evaluations for Vision-Language Models (VLMs) assume a one-to-one mapping between language and orthography, overlooking billions of users of multi-script languages. We introduce PuMVR (Punjabi Multimodal Visual Reasoning), a benchmark of 1,000 strictly parallel image-text instances across Punjabi's three active scripts: Gurmukhi, Shahmukhi, and Roman. Evaluating 10 state-of-the-art VLMs, we expose a substantial and systematic Script Gap. Models frequently solve visual tasks in one script while failing identical tasks in another, with accuracy deltas reaching 16%. Crucially, visual input boosts absolute performance uniformly yet does not close the orthographic gap. Furthermore, cross-script in-context transfer is highly brittle, exposing script-locked knowledge representation. Supported by McNemar tests across all script pairs, our findings demonstrate that current "multilingual" VLMs are not truly multi-script. We propose the Script Consistency Rate (SCR), which falls as low as 24.8% on our benchmark, as a mandatory metric for script-agnostic evaluation to ensure equitable AI access. Data and code are available at: https://github.com/prabhjotschugh/Not-Truly-Multilingual-PuMVR.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/not-truly-multilingual-s…

Read original on arxiv.org → arxiv.org/abs/2606.17188

mentioned entities

PuMVR

Punjabi Multimodal Visual Reasoning

Gurmukhi

Shahmukhi

Roman

Script Consistency Rate

McNemar

arXiv

metadata

slugnot-truly-multilingual-script-consistency-as-a-missing-dimension-in-vlm

topic#large-language-models

secondary4 topics

sentimentnegative

canonicalarxiv.org

navigation

← prevRay Data LLM enables 2x throughp…

next →Trust Begins with DNS: Mitigatin…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 17 Jun · #large-language-models

Incumbent Advantage: Brand Bias and Cognitive Manipulation Dynamics in LLM Recommendation Systems

arxiv.org · 17 Jun · #large-language-models

Training LLMs with Reinforcement Learning over Digital Twin Representations for Reasoning-Intensive Surgical VideoQA

arxiv.org · 17 Jun · #large-language-models

Reasoning Text-to-Video Retrieval for Operating Room Clips via Action-Driven Digital Twins

tenureai.dev · 17 Jun · #large-language-models

AI memory systems break at scale

── more on @pumvr 3 stories trending now

wpnews · 16 Jun · #ai-agents

The LLM Is Not the Final Authority: Building Trust Infrastructure for AI Agents

wpnews · 16 Jun · #artificial-intelligence

Most Businesses Lose Leads at Night — So I Built This

wpnews · 16 Jun · #ai-safety

Researchers propose causal framework to audit synthetic data

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required