PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

wpnews.pro

cd /news/ai-agents/phoneharness-harnessing-phone-use-ag… · home › topics › ai-agents › article

[ARTICLE · art-28949] src=arxiv.org ↗ pub=2026-06-16T04:00Z topic=ai-agents verified=true sentiment=↑ positive

PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions

Researchers introduced PhoneHarness, a mixed-action benchmark and execution harness for phone-use agents that combines GUI, CLI, and tool actions. The system achieved a 75.0% pass rate on verifiable mobile workflows, outperforming other settings by 12.9 percentage points. The work highlights the importance of action-surface routing and verifiable execution over pure visual GUI control for reliable phone automation.

read1 min views1 publishedJun 16, 2026

arXiv:2606.14832v1 Announce Type: new Abstract: Phone agents are increasingly expected to complete real mobile workflows rather than merely predict the next screen action. However, much of the current mobile-agent literature still evaluates agents primarily as GUI controllers that observe a screen, emit taps and swipes, and are scored by target app state. Real phone-use tasks are broader: they require deciding when to use app GUIs, device-side commands, or structured tools, while leaving evidence that the intended side effect actually occurred. We introduce PhoneHarness, a mixed-action benchmark and execution harness for studying phone-use agents on verifiable mobile workflows. PhoneHarness runs a device-side agent loop over GUI, CLI, and host-side tool actions, combining deterministic action routing with bounded GUI delegation and auditable execution traces. Its benchmark, PhoneHarness Bench, evaluates whether agents complete tasks with observable side effects, not only whether they produce plausible final answers. On the annotated evaluation split, PhoneHarness reaches a 75.0% pass rate, outperforming the strongest non-PhoneHarness settings by 12.9 percentage points. PhoneHarness and PhoneHarness Bench therefore play distinct but mutually dependent roles: the harness makes mixed phone workflows executable, while the benchmark measures whether agents can use that harness reliably and safely. Our findings suggest that reliable phone automation depends on action-surface routing and verifiable execution, not only visual GUI control.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/phoneharness-harnessing-…

Read original on arxiv.org → arxiv.org/abs/2606.14832

mentioned entities

PhoneHarness

PhoneHarness Bench

metadata

slugphoneharness-harnessing-phone-use-agents-through-mixed-gui-cli-and-tool-actions

topic#ai-agents

secondary2 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevBuild Your Own AI Automation wit…

next →Could a diamond wafer as wide as…

── more in #ai-agents 4 stories · sorted by recency

letsdatascience.com · 16 Jun · #ai-agents

Paper Proposes Causal ToM Model for Conflict

dev.to · 16 Jun · #ai-agents

Automation Before Automation (ABA) — A Missing Phase in Modern Testing?

code.visualstudio.com · 17 Jun · #ai-agents

Visual Studio Code 1.125

koreaherald.com · 16 Jun · #ai-agents

Lotte pushes AI agents across workplace

── more on @phoneharness 3 stories trending now

wpnews · 15 Jun · #artificial-intelligence

Facebook now has an AI search engine that pulls answers from your Group posts and Reels

wpnews · 15 Jun · #generative-ai

Pentagon Reports 1.5 Million Daily GenAI.mil Users

wpnews · 15 Jun · #large-language-models

The Grain of Thought

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required