{"slug": "how-to-debug-llm-driven-android-automation-runs", "title": "How to Debug LLM-Driven Android Automation Runs", "summary": "A developer building LLM-driven Android automation tools has outlined a structured debugging approach that saves detailed run traces instead of just final screenshots. The method captures UI dumps, model decisions, tool calls, and exit codes at every step, enabling engineers to identify whether failures stem from the model, the app, the automation tool, or timing issues. The approach uses numbered files and compact action tables to make each run inspectable and replayable without requiring pixel-perfect video.", "body_md": "LLM-driven Android automation fails in strange ways.\n\nThe model may tap the wrong label. The screen may change between observation and action. A keyboard may cover the button. A permission dialog may appear. The app may still be loading. The UI dump may expose two identical \"Continue\" buttons.\n\nIf all you saved is the final screenshot, debugging is painful.\n\nYou need a run trace.\n\nFor every Android agent step, save:\n\nThe minimum useful trace looks like this:\n\n```\nobserve: tap Button \"Continue\" #continue 540,860\nmodel:   tap \"Continue\"\naction:  hs tap \"Continue\" --visible --unique\nresult:  ok\nwait:    hs wait \"Dashboard\" --timeout 15s\nresult:  TIMEOUT\n```\n\nThat is much easier to debug than \"the agent failed.\"\n\nAndroid agent failures usually fall into a few buckets.\n\n| Failure | What it means |\n|---|---|\n`NOT_FOUND` |\nThe target label or selector was not visible |\n`AMBIGUOUS` |\nMore than one visible node matched |\n`TIMEOUT` |\nThe expected next state never appeared |\n`SECURE_WINDOW` |\nAndroid blocked screenshots for the current window |\n| Wrong action | The model chose a bad label or command |\n| Stale observation | The UI changed after the model saw it |\n\nGood tooling should preserve which bucket happened.\n\nIf everything becomes \"click failed\", the agent cannot recover intelligently.\n\nThe UI dump is the agent's view of the world.\n\nSave it before each model decision:\n\n```\nhs ui > run/0007-ui.txt\n```\n\nFor LLM agents, a compact action table is usually better than full XML:\n\n```\nfill  EditText  \"Email\"     #email     540,540\nfill  EditText  \"Password\"  #password  540,640  [password]\ntap   Button    \"Continue\"  #continue  540,860\n```\n\nWhen a model picks the wrong action, this file tells you whether the model had a reasonable choice.\n\nScreenshots are valuable, but you do not need a full native PNG on every step.\n\nFor most agent debugging:\n\n```\nhs see --size 768 run/0007-screen.jpg\n```\n\nUse screenshots when:\n\nUse the text UI as the default. Use screenshots as evidence.\n\nDo not only save the final command.\n\nSave what the model actually emitted:\n\n```\n{\n  \"step\": 7,\n  \"model_action\": \"tap \\\"Continue\\\"\",\n  \"tool_call\": [\"hs\", \"tap\", \"Continue\", \"--visible\", \"--unique\"],\n  \"reason\": \"The login form is filled and Continue is visible.\"\n}\n```\n\nThis matters because the bug may be in translation:\n\nKeep the model layer and tool layer separate.\n\nExit codes and error codes are better than stderr scraping.\n\nHandsets has common exit codes:\n\n```\n0  ok\n2  NOT_FOUND\n3  TIMEOUT\n4  AMBIGUOUS\n```\n\nIn JSON mode, preserve the structured error:\n\n```\nhs --json tap \"Continue\" --visible --unique\n```\n\nThen your agent can decide:\n\n`NOT_FOUND`\n\n: dump UI again or scroll`AMBIGUOUS`\n\n: ask for a narrower selector`TIMEOUT`\n\n: capture screenshot and logs`SECURE_WINDOW`\n\n: continue without screenshotAndroid logs are noisy. A small tail near the failure is usually enough:\n\n```\nhs logs --tail 200 > run/0007-logcat.txt\n```\n\nPair logs with the UI dump and screenshot from the same step. Otherwise you end up with artifacts that are technically present but hard to correlate.\n\nUse numbered files:\n\n```\nrun/\n  0001-ui.txt\n  0001-action.json\n  0001-result.json\n  0002-ui.txt\n  0002-screen.jpg\n  0002-action.json\n  0002-result.json\n  0002-logcat.txt\n```\n\nThis is not fancy. That is the point.\n\nBefore building a dashboard, make the run inspectable with plain files.\n\nOnce you have traces, replay becomes possible.\n\nThe useful replay is not pixel-perfect video. It is a timeline:\n\n```\nStep 1: observed Sign in\nStep 2: tapped Sign in\nStep 3: filled Email\nStep 4: filled Password\nStep 5: tapped Continue\nStep 6: timed out waiting for Dashboard\n```\n\nFor teams, this timeline becomes the product. It lets an engineer see whether the model, the tool, or the app caused the failure.\n\nBecause failures can come from the model, the app, the Android UI state, the automation tool, or timing. A final screenshot does not tell you which layer failed.\n\nNot always. Save compact UI dumps for every step. Add screenshots for visual states, failures, and custom-rendered screens.\n\nThe pre-action UI dump. It shows what the model saw when it chose the action.\n\nStructured traces let you build targeted recovery: scroll on `NOT_FOUND`\n\n, narrow selectors on `AMBIGUOUS`\n\n, capture logs on `TIMEOUT`\n\n, and avoid retrying blindly.\n\nOriginally published at [https://handsets.dev/blog/debug-llm-android-automation-runs/](https://handsets.dev/blog/debug-llm-android-automation-runs/).", "url": "https://wpnews.pro/news/how-to-debug-llm-driven-android-automation-runs", "canonical_source": "https://dev.to/elliotgao2/how-to-debug-llm-driven-android-automation-runs-3eej", "published_at": "2026-05-26 12:21:43+00:00", "updated_at": "2026-05-26 12:33:38.047507+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "artificial-intelligence", "ai-tools", "mlops"], "entities": ["Android", "LLM"], "alternates": {"html": "https://wpnews.pro/news/how-to-debug-llm-driven-android-automation-runs", "markdown": "https://wpnews.pro/news/how-to-debug-llm-driven-android-automation-runs.md", "text": "https://wpnews.pro/news/how-to-debug-llm-driven-android-automation-runs.txt", "jsonld": "https://wpnews.pro/news/how-to-debug-llm-driven-android-automation-runs.jsonld"}}