VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

wpnews.pro

cd /news/large-language-models/vamps-visual-assisted-mathematical-p… · home › topics › large-language-models › article

[ARTICLE · art-21097] src=arxiv.org pub=2026-06-04T04:00Z topic=large-language-models verified=true sentiment=· neutral

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

Researchers have introduced VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark of 1,168 bilingual multiple-choice questions designed to test whether multimodal AI models can improve their reasoning by constructing and interpreting graphs. The study found that across diverse models, direct analytical solving consistently outperformed tool-enabled visual solving, even on problems where plotting is a natural strategy. The findings highlight a critical gap in AI's ability to externalize problems through visualization tools, a common practice in real-world engineering and scientific workflows.

read1 min publishedJun 4, 2026

arXiv:2606.04244v1 Announce Type: new Abstract: Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making. To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems and expanded with human-reviewed LLM-generated synthetic variants, all selected so that plotting provides a natural solution strategy by revealing intersections, extrema, asymptotes, etc. Designed for both benchmarking and diagnosis, VAMPS goes beyond prior multimodal benchmarks that primarily evaluate reasoning over fixed visual inputs by testing whether a model can benefit from constructing a useful graph and grounding its answer in the resulting visualization. Overall, we found that across a diverse set of models, direct analytical solving surprisingly outperforms tool-enabled visual solving, even on problems where plotting is a natural strategy.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/vamps-visual-assisted-ma…

Read original on arxiv.org → arxiv.org/abs/2606.04244

mentioned entities

VAMPS

Iranian University Entrance Exam

metadata

slugvamps-visual-assisted-mathematical-problem-solving-benchmark

topic#large-language-models

secondary4 topics

sentimentneutral

langen

canonicalarxiv.org

navigation

← prevHow FinOps Teams Trace Per-Reque…

next →SharkFlow Legal — devto

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 4 Jun · #large-language-models

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

arxiv.org · 4 Jun · #large-language-models

Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline

dev.to · 4 Jun · #large-language-models

I Spent 2 Weeks Trying to Make OpenCV Recognize Game Cards — Here's Why It Failed All tests run on an 8-year-old MacBook Air.#3

arxiv.org · 4 Jun · #large-language-models

Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required