cd /news/developer-tools/project-log-10-i-m-ditching-screensh… · home topics developer-tools article
[ARTICLE · art-39812] src=dev.to ↗ pub= topic=developer-tools verified=true sentiment=↑ positive

Project Log #10: I'm Ditching Screenshots. Here's Why.

A developer is replacing a screenshot-based vision system with UI hierarchy inspection for an Android automation agent. The old approach using OCR and template matching failed across different devices due to varying screen densities and rendering. The new method uses ADB's uiautomator dump to read the UI tree directly, providing 100% accuracy and faster performance.

read2 min views1 publishedJun 25, 2026

Day 10. OCR and template matching hit their limits. UI hierarchy inspection might be the real answer.

Nine days ago, I was proud of my screenshot-based vision system. ML Kit for text. Template matching for icons. A clever fallback chain that worked most of the time.

Today, I'm ripping most of it out.

The Breaking Point

Last week, I tested the agent on a friend's phone. Template matching failed. The same icons I cropped on my device didn't match on his—different screen density, different rendering, different pixel arrangement.

I explored building a multi-resolution icon library. Crop every icon at 5 different DPIs? That's tedious. I explored AI-based icon detection. Train a model to recognize buttons by shape? That's heavy for a phone CPU.

Then I remembered something. Android already knows what's on the screen. It has to—it's rendering the UI. And there's a way to read that information directly.

Enter UI Hierarchy Inspection

ADB has a command called uiautomator dump

. It spits out an XML file containing every visible UI element on the screen—buttons, text fields, icons, images, everything. Each element has:

android.widget.Button

, android.widget.ImageView

)This is not a screenshot. This is the app's internal blueprint.

Why This Changes Everything

Screenshot-Based (Old Way) UI Tree (New Way)
Run OCR on a screenshot (1.5–2s) Run one ADB command (0.5–1s)
If text not found, try template matching (2–4s) Not needed. Icons have content descriptions.
Accuracy depends on screen resolution and DPI Accuracy is 100%—the OS tells you exactly where things are
Breaks on different devices Works across all devices. Same XML structure.
Can't detect icons without reference images Icons are in the tree with coordinates

The First Experiment

I ran adb shell uiautomator dump

on my phone, then pulled the XML file. I searched for "send." Here's a snippet of what I found:

xml
<node
  class="android.widget.ImageButton"
  content-desc="Send message"
  bounds="[924,1656][1020,1752]"
  clickable="true"
  package="com.whatsapp" />
── more in #developer-tools 4 stories · sorted by recency
── more on @android 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/project-log-10-i-m-d…] indexed:0 read:2min 2026-06-25 ·