{"slug": "project-log-10-i-m-ditching-screenshots-here-s-why", "title": "Project Log #10: I'm Ditching Screenshots. Here's Why.", "summary": "A developer is replacing a screenshot-based vision system with UI hierarchy inspection for an Android automation agent. The old approach using OCR and template matching failed across different devices due to varying screen densities and rendering. The new method uses ADB's uiautomator dump to read the UI tree directly, providing 100% accuracy and faster performance.", "body_md": "Day 10. OCR and template matching hit their limits. UI hierarchy inspection might be the real answer.\n\nNine days ago, I was proud of my screenshot-based vision system. ML Kit for text. Template matching for icons. A clever fallback chain that worked most of the time.\n\nToday, I'm ripping most of it out.\n\n**The Breaking Point**\n\nLast week, I tested the agent on a friend's phone. Template matching failed. The same icons I cropped on my device didn't match on his—different screen density, different rendering, different pixel arrangement.\n\nI explored building a multi-resolution icon library. Crop every icon at 5 different DPIs? That's tedious. I explored AI-based icon detection. Train a model to recognize buttons by shape? That's heavy for a phone CPU.\n\nThen I remembered something. Android already knows what's on the screen. It has to—it's rendering the UI. And there's a way to read that information directly.\n\n**Enter UI Hierarchy Inspection**\n\nADB has a command called `uiautomator dump`\n\n. It spits out an XML file containing every visible UI element on the screen—buttons, text fields, icons, images, everything. Each element has:\n\n`android.widget.Button`\n\n, `android.widget.ImageView`\n\n)This is not a screenshot. This is the app's internal blueprint.\n\n**Why This Changes Everything**\n\n| Screenshot-Based (Old Way) | UI Tree (New Way) |\n|---|---|\n| Run OCR on a screenshot (1.5–2s) | Run one ADB command (0.5–1s) |\n| If text not found, try template matching (2–4s) | Not needed. Icons have content descriptions. |\n| Accuracy depends on screen resolution and DPI | Accuracy is 100%—the OS tells you exactly where things are |\n| Breaks on different devices | Works across all devices. Same XML structure. |\n| Can't detect icons without reference images | Icons are in the tree with coordinates |\n\n**The First Experiment**\n\nI ran `adb shell uiautomator dump`\n\non my phone, then pulled the XML file. I searched for \"send.\" Here's a snippet of what I found:\n\n```\nxml\n<node\n  class=\"android.widget.ImageButton\"\n  content-desc=\"Send message\"\n  bounds=\"[924,1656][1020,1752]\"\n  clickable=\"true\"\n  package=\"com.whatsapp\" />\n```\n\n", "url": "https://wpnews.pro/news/project-log-10-i-m-ditching-screenshots-here-s-why", "canonical_source": "https://dev.to/okeke_chukwudubem_5f3bf49/project-log-10-im-ditching-screenshots-heres-why-3o7a", "published_at": "2026-06-25 18:53:09+00:00", "updated_at": "2026-06-25 19:13:11.839809+00:00", "lang": "en", "topics": ["developer-tools", "computer-vision", "artificial-intelligence"], "entities": ["Android", "ADB", "ML Kit", "WhatsApp"], "alternates": {"html": "https://wpnews.pro/news/project-log-10-i-m-ditching-screenshots-here-s-why", "markdown": "https://wpnews.pro/news/project-log-10-i-m-ditching-screenshots-here-s-why.md", "text": "https://wpnews.pro/news/project-log-10-i-m-ditching-screenshots-here-s-why.txt", "jsonld": "https://wpnews.pro/news/project-log-10-i-m-ditching-screenshots-here-s-why.jsonld"}}