{"slug": "project-log-9-my-ai-agent-works-on-my-phone-but-what-about-yours", "title": "Project Log #9: My AI Agent Works on My Phone. But What About Yours?", "summary": "A developer building an AI phone agent found that template matching fails across different devices due to varying screen sizes, resolutions, and Android versions. In tests, OCR maintained ~90% accuracy on a friend's phone, but icon matching confidence dropped below 60%, causing the agent to fail. The developer is now exploring UI hierarchy inspection via ADB's uiautomator dump as a device-agnostic alternative.", "body_md": "Day 9. Template matching works. But screen sizes, resolutions, and Android versions might break everything.\n\nEight days ago, the agent was an idea. Now it can read text, handle interruptions, and find icons on a screen.\n\nBut there's a question I've been avoiding: does it work on any phone other than mine?\n\n**The Cross-Device Problem**\n\nEvery screenshot I've taken, every icon I've cropped, every coordinate I've mapped—it's all on one device. My phone. Same screen size. Same resolution. Same Android version. Same DPI.\n\nTemplate matching relies on reference images that look exactly like the target on screen. Change the screen density, change the icon size, change the font scaling, and the match confidence drops. Suddenly \"send_button.png\" doesn't match anymore, and the agent can't press send.\n\nThis isn't a bug in my code. It's a fundamental challenge in computer vision: reference-based matching breaks when the visual context changes.\n\n**Today's Experiment**\n\nI tested the same agent on a friend's phone—different manufacturer, different Android version, slightly larger screen. The results were humbling.\n\n| Task | My Phone | Friend's Phone |\n|---|---|---|\n| OCR (text recognition) | ✅ 95% accuracy | ✅ ~90% accuracy |\n| Find \"Mom\" in contacts | ✅ Found | ✅ Found |\n| Template match: send button | ✅ 94% confidence | ❌ 62% confidence |\n| Template match: back button | ✅ 91% confidence | ❌ 58% confidence |\n\nOCR held up reasonably well because text is text. Fonts might change slightly, but the characters are the same. But the icons—the send button, the back arrow—were rendered at a different size and slightly different pixel arrangement on my friend's device.\n\nThe agent failed to send the message.\n\n**Why This Matters**\n\nAn AI agent that only works on one phone isn't an agent. It's a script. If I want this to be useful to anyone else—or even to myself if I change phones—it needs to be device-agnostic.\n\n**Possible Solutions I'm Exploring**\n\n| Solution | Pros | Cons |\n|---|---|---|\nMulti-resolution icon library |\nSimple. Just crop icons at different DPIs. | Tedious. How many variants are enough? |\nAI-based icon detection |\nCould recognize icons by shape, not pixels. | Requires training data. Heavy for a phone. |\nUI hierarchy inspection |\nInstead of \"seeing\" the screen, read the app's UI tree directly via ADB. | Requires root or accessibility service. Not universal. |\nRelative positioning |\nOnce OCR finds text, calculate icon positions relative to known landmarks. | Fragile. Different layouts on different devices. |\n\nNone of these are perfect. All of them are more work. But that's the reality of building something that's supposed to work in the wild, not just in a demo.\n\n**What I'm Trying First**\n\nThe UI hierarchy approach. ADB has a command called `uiautomator dump`\n\nthat returns an XML tree of every visible element on screen—text, buttons, icons, everything. Each element has bounds, a class name, and a content description.\n\nIf I can parse that XML tree instead of taking screenshots, the agent doesn't need to \"see\" the screen at all. It just reads the structure. No OCR. No template matching. No resolution issues.\n\nThis is a fundamental architectural shift. But it might be the right one.\n\n**What's Next (Day 10)**\n\n`uiautomator dump`\n\nas a replacement for screenshot-based detection**The Repo**\n\n👉 **github.com/Dexter2344/phone-agent**\n\nAll code from Day 8 is live. The Day 9 experiments are in a new branch called `ui-tree-experiment`\n\n. I'll merge to main once I have results.\n\nThis is Day 9. The hard problems don't stop coming. But neither do I.", "url": "https://wpnews.pro/news/project-log-9-my-ai-agent-works-on-my-phone-but-what-about-yours", "canonical_source": "https://dev.to/okeke_chukwudubem_5f3bf49/project-log-9-my-ai-agent-works-on-my-phone-but-what-about-yours-2mng", "published_at": "2026-06-20 21:22:01+00:00", "updated_at": "2026-06-20 21:39:24.984417+00:00", "lang": "en", "topics": ["computer-vision", "ai-agents", "developer-tools"], "entities": ["Dexter2344", "ADB", "uiautomator"], "alternates": {"html": "https://wpnews.pro/news/project-log-9-my-ai-agent-works-on-my-phone-but-what-about-yours", "markdown": "https://wpnews.pro/news/project-log-9-my-ai-agent-works-on-my-phone-but-what-about-yours.md", "text": "https://wpnews.pro/news/project-log-9-my-ai-agent-works-on-my-phone-but-what-about-yours.txt", "jsonld": "https://wpnews.pro/news/project-log-9-my-ai-agent-works-on-my-phone-but-what-about-yours.jsonld"}}