VISUALSKILL: Multimodal Skills for Computer-Use Agents

wpnews.pro

cd /news/artificial-intelligence/visualskill-multimodal-skills-for-co… · home › topics › artificial-intelligence › article

[ARTICLE · art-32073] src=arxiv.org ↗ pub=2026-06-18T04:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

VISUALSKILL: Multimodal Skills for Computer-Use Agents

Researchers introduced VISUALSKILL, a multimodal skill library for computer-use agents that combines text and visual figures to improve performance on long-horizon tasks. In tests, the system achieved a 0.456 average score on CUA benchmarks, outperforming text-only skills by 8.3 points and no-skill baselines by 15.3 points. The findings demonstrate that retaining visual artifacts helps agents identify UI elements and verify workflow states.

read1 min views2 publishedJun 18, 2026

arXiv:2606.18448v1 Announce Type: new Abstract: Computer-use agents (CUAs) approach human-level performance on standardised benchmarks but still struggle on long-horizon tasks and unseen software. Existing skill libraries address this with reusable skills, but represent the skill artifact as text only, despite the visual nature of GUI interaction. We propose VISUALSKILL: a hierarchical multimodal skill, tailored to each target application and organised as a central index over per-topic files, which the agent consumes through a load_topic MCP tool that fetches the relevant topic's text and figures on demand. We construct each skill with a two-stage pipeline that combines authored documentation with live-application UI exploration. On two CUA benchmarks, CUA-World and OSExpert-Eval, a Claude Code CLI agent backed by Claude Opus 4.6 reaches an average score of 0.456 with VISUALSKILL, a +15.3 point absolute lift over the no-skill baseline (0.303). Against a matched text-only skill that is generated from the same source content and differs from VISUALSKILL only in modality, VISUALSKILL yields a further +8.3 point absolute gain over the matched text-only skill (0.373 vs. 0.456), providing direct evidence that retaining visual figures in the skill artifact, rather than verbalizing them away, helps the agent both identify UI elements and verify workflow state after each action. Our code is available at https://github.com/XMHZZ2018/VisualSkills.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/visualskill-multimodal-s…

Read original on arxiv.org → arxiv.org/abs/2606.18448

mentioned entities

Claude Code CLI

Claude Opus 4.6

CUA-World

OSExpert-Eval

metadata

slugvisualskill-multimodal-skills-for-computer-use-agents

topic#artificial-intelligence

secondary3 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevIs AI Getting Quietly Dumber? A …

next →Most agentic AI projects in prod…

── more in #artificial-intelligence 4 stories · sorted by recency

arxiv.org · 18 Jun · #artificial-intelligence

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

arxiv.org · 18 Jun · #artificial-intelligence

CaVe-VLM-CoT: An Interpretable Vision-Language Model Framework

discuss.huggingface.co · 18 Jun · #artificial-intelligence

Hiring: Staff Software Engineer @ ZoomRx Healthcare Pvt Ltd (Hybrid/[Chennai, Pune, Gurugram

kwokchain.com · 18 Jun · #artificial-intelligence

Cursor and SpaceX: In search of a complete loop

── more on @claude code cli 3 stories trending now

wpnews · 17 Jun · #developer-tools

CircleCI MCP Server: Debug Build Failures Without Leaving Your AI Coding Agent

wpnews · 17 Jun · #artificial-intelligence

How I Build Production AI Apps on Cloudflare with Claude Code

wpnews · 16 Jun · #large-language-models

I'm building CortexDB — an agent-native context database for AI agents

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required