Project Log #9: My AI Agent Works on My Phone. But What About Yours?

wpnews.pro

cd /news/computer-vision/project-log-9-my-ai-agent-works-on-m… · home › topics › computer-vision › article

[ARTICLE · art-35125] src=dev.to ↗ pub=2026-06-20T21:22Z topic=computer-vision verified=true sentiment=· neutral

Project Log #9: My AI Agent Works on My Phone. But What About Yours?

A developer building an AI phone agent found that template matching fails across different devices due to varying screen sizes, resolutions, and Android versions. In tests, OCR maintained ~90% accuracy on a friend's phone, but icon matching confidence dropped below 60%, causing the agent to fail. The developer is now exploring UI hierarchy inspection via ADB's uiautomator dump as a device-agnostic alternative.

read3 min views1 publishedJun 20, 2026

Day 9. Template matching works. But screen sizes, resolutions, and Android versions might break everything.

Eight days ago, the agent was an idea. Now it can read text, handle interruptions, and find icons on a screen.

But there's a question I've been avoiding: does it work on any phone other than mine?

The Cross-Device Problem

Every screenshot I've taken, every icon I've cropped, every coordinate I've mapped—it's all on one device. My phone. Same screen size. Same resolution. Same Android version. Same DPI.

Template matching relies on reference images that look exactly like the target on screen. Change the screen density, change the icon size, change the font scaling, and the match confidence drops. Suddenly "send_button.png" doesn't match anymore, and the agent can't press send. This isn't a bug in my code. It's a fundamental challenge in computer vision: reference-based matching breaks when the visual context changes.

Today's Experiment

I tested the same agent on a friend's phone—different manufacturer, different Android version, slightly larger screen. The results were humbling.

Task	My Phone	Friend's Phone
OCR (text recognition)	✅ 95% accuracy	✅ ~90% accuracy
Find "Mom" in contacts	✅ Found	✅ Found
Template match: send button	✅ 94% confidence	❌ 62% confidence
Template match: back button	✅ 91% confidence	❌ 58% confidence

OCR held up reasonably well because text is text. Fonts might change slightly, but the characters are the same. But the icons—the send button, the back arrow—were rendered at a different size and slightly different pixel arrangement on my friend's device.

The agent failed to send the message.

Why This Matters

An AI agent that only works on one phone isn't an agent. It's a script. If I want this to be useful to anyone else—or even to myself if I change phones—it needs to be device-agnostic.

Possible Solutions I'm Exploring

Solution	Pros	Cons
Multi-resolution icon library
Simple. Just crop icons at different DPIs.	Tedious. How many variants are enough?
AI-based icon detection
Could recognize icons by shape, not pixels.	Requires training data. Heavy for a phone.
UI hierarchy inspection
Instead of "seeing" the screen, read the app's UI tree directly via ADB.	Requires root or accessibility service. Not universal.
Relative positioning
Once OCR finds text, calculate icon positions relative to known landmarks.	Fragile. Different layouts on different devices.

None of these are perfect. All of them are more work. But that's the reality of building something that's supposed to work in the wild, not just in a demo.

What I'm Trying First

The UI hierarchy approach. ADB has a command called uiautomator dump

that returns an XML tree of every visible element on screen—text, buttons, icons, everything. Each element has bounds, a class name, and a content description.

If I can parse that XML tree instead of taking screenshots, the agent doesn't need to "see" the screen at all. It just reads the structure. No OCR. No template matching. No resolution issues. This is a fundamental architectural shift. But it might be the right one.

What's Next (Day 10) uiautomator dump

as a replacement for screenshot-based detectionThe Repo

👉 github.com/Dexter2344/phone-agent

All code from Day 8 is live. The Day 9 experiments are in a new branch called ui-tree-experiment

. I'll merge to main once I have results.

This is Day 9. The hard problems don't stop coming. But neither do I.

source & further reading

dev.to — original article Supercharge your web app with free AI that runs in your users' browser I spent two weeks optimizing 96GB of VRAM for local LLMs. Paid APIs still won. From the factory floor to AI developer: tools that run in my own plant

~/api · this article 200

$curl api.wpnews.pro/v1/news/project-log-9-my-ai-agen…

Read original on dev.to → dev.to/okeke_chukwudubem_5f3bf49/project-log-9-m…

mentioned entities