{"slug": "i-found-the-r-openclaw-thread-with-27-upvotes-where-someone-gave-an-agent-a-real", "title": "I found the r/openclaw thread with 27 upvotes where someone gave an agent a real iPhone and now I can’t stop thinking about it", "summary": "A developer on Reddit's r/openclaw forum shared a method for giving an AI agent full control over a real iPhone using an \"Appium type layer,\" bypassing simulators for persistent mobile identity. The approach, described as \"pretty hacky,\" allows agents to maintain state across days with a real phone number and app sessions, targeting mobile-only workflows that lack API access. The thread, which received 27 upvotes, highlights a growing interest in using real devices rather than browser automation for agent-based task execution.", "body_md": "A few days ago I found this r/openclaw post: [“I gave my agent my actual iphone..”](https://reddit.com/r/openclaw/comments/1towv70/i_gave_my_agent_my_actual_iphone/)\n\nIt had 27 upvotes and 16 comments.\n\nThat low number is exactly why I clicked.\n\nThe most interesting agent ideas usually show up before they get polished into a launch video. One builder does something slightly cursed, a few other builders pile into the comments, and suddenly you can see where the category is heading.\n\nThis thread felt like that.\n\nThe poster said they weren’t using a simulator. Not a browser pretending to be a phone. A real iPhone. They said the agent could access it “entirely,” and later explained it was an “Appium type layer” that was “pretty hacky.”\n\nThat one detail matters more than the headline.\n\nBecause if you’re building agents, the next battleground probably isn’t just browser automation. It’s persistent mobile identity: a real phone number, a real app session, a real logged-in device that keeps state across days.\n\nAnd once you think about it that way, this stops sounding like a gimmick.\n\nWhen the poster said “Appium type layer,” the whole thing became more believable.\n\nIf you’ve done mobile automation before, this is the obvious primitive.\n\n```\ncapabilities = {\n  \"platformName\": \"iOS\",\n  \"appium:automationName\": \"XCUITest\",\n  \"appium:deviceName\": \"iPhone\",\n  \"appium:platformVersion\": \"16.0\"\n}\n\n# Real devices often also need:\n# \"appium:udid\": \"<device-udid>\"\n```\n\nThat is not some mysterious AI-native stack. It is just mobile automation plus an agent loop on top.\n\nWhich is also why it is fragile.\n\nIf your agent is driving a real iPhone UI, you are always one step away from:\n\nSo yes, this is hacky.\n\nBut browser agents were hacky too. Early RPA was hacky. Selenium was hacky. “Hacky” is often just the first version of something people will absolutely want once the tooling gets better.\n\nThis is where the thread got smarter than the title.\n\nThe poster said they were testing:\n\nThat is a solid list.\n\nNot because “AI on a phone” is novel. That framing is too shallow.\n\nThe real use case is giving an agent a durable mobile identity.\n\nThat means:\n\nThat matters because a lot of ugly automation work still lives inside mobile-only apps.\n\nNot everything has an API.\n\nAnd even when an API exists, it often does not expose the exact workflow the human app does.\n\nIf you are building this, the right move is not “automate the whole UI for everything.”\n\nIt is a layered stack:\n\nThat is the practical version.\n\nShortcuts are especially interesting here because they can become the bridge between clean integrations and messy UI control.\n\nIf the app exposes a Shortcut action, let the agent call that.\n\nIf it does not, let the agent drive the screen.\n\nThat hybrid model is much better than pretending every task deserves full visual control.\n\nThe obvious objection is: why automate a phone UI when you could just build a proper integration?\n\nFair point.\n\nIf the app already has a stable API, use the API.\n\nIf it exposes a Shortcut action, use that.\n\nDriving the entire iPhone UI to do something that could have been one HTTP request is slower, more brittle, and kind of ridiculous.\n\nBut the believers in that thread are also right about something more important:\n\nA shocking amount of real work still hides behind mobile-only interfaces.\n\nThat includes:\n\nThat is why this resonated with OpenClaw users.\n\nPeople building agents do not just want a bot that answers questions in Slack. They want systems that operate.\n\nAnd operation means touching ugly surfaces.\n\nBrowsers are one ugly surface.\n\nA real iPhone is another.\n\nOne small comment in the thread stuck with me: the poster said they were using “flash 3.5” and that it worked well enough.\n\nThat is the tell.\n\nThey are already separating the control layer from the model layer.\n\nThat is exactly what you want.\n\nBecause once an agent is driving a phone, cost can spike fast.\n\nA single retry might mean:\n\nDo that over and over in a long-lived session and per-token billing starts looking terrible.\n\nThis is where a lot of agent demos quietly fall apart in production. The task itself is not expensive. The repeated thinking around the task is.\n\nIf you are paying per token, every loop hurts.\n\nIf you are running agents in n8n, Make, Zapier, OpenClaw, or custom workflows, this gets even uglier because the agent is rarely doing one clean request. It is bouncing through retries, checks, tool calls, and approval steps.\n\nThat is why model routing matters.\n\nUse a cheaper model for routine perception and planning.\n\nEscalate to stronger models only when the task is ambiguous, high-stakes, or approval-gated.\n\nAnd if your workload is continuous, flat-rate compute becomes much more attractive than metered token billing.\n\nThat is the practical reason I think products like [Standard Compute](https://standardcompute.com) are relevant here. If you are building long-running agents, especially agents that loop through mobile UI states, unlimited compute at a predictable monthly price is a much saner fit than watching token spend every time the agent stares at a loading spinner.\n\nThis is the kind of split I would use:\n\n| Task type | Model strategy |\n|---|---|\n| Basic screen understanding | Cheap fast model |\n| Repeated UI retries | Cheap fast model |\n| Sensitive actions like send/book/buy | Strong model plus approval |\n| Ambiguous flows or broken state recovery | Strong model |\n| Long-running automation at scale | Flat-rate compute if possible |\n\nPseudo-code version:\n\n```\ntype Action = \"tap\" | \"scroll\" | \"type\" | \"send\" | \"book\" | \"buy\";\n\nfunction pickModel(task: {\n  action: Action;\n  ambiguous: boolean;\n  sensitive: boolean;\n  retryCount: number;\n}) {\n  if (task.sensitive) return \"strong-model\";\n  if (task.ambiguous) return \"strong-model\";\n  if (task.retryCount > 3) return \"strong-model\";\n  return \"fast-cheap-model\";\n}\n```\n\nThe point is simple: do not spend premium-model money on every tap.\n\nThe scary part is not whether an agent can tap buttons.\n\nThe scary part is that a real iPhone can do real things.\n\nA browser agent submitting the wrong form is annoying.\n\nAn iPhone agent sending the wrong iMessage, confirming the wrong booking, or touching the wrong payment flow is a different class of mistake.\n\nSo if you are serious about this, I think the minimum viable guardrails look like this:\n\nSomething like:\n\n``` js\nconst session = await phones.createSession({\n  device: \"iphone\",\n  region: \"us\",\n  approvals: \"sensitive-actions\",\n  allowedApps: [\"Messages\", \"Shortcuts\", \"BookingApp\"]\n});\n\nawait agents.runTask({\n  sessionId: session.id,\n  goal: \"Draft an iMessage confirming the new appointment time\",\n  approvalBefore: [\"send\", \"book\", \"buy\"],\n  handoffOnLowConfidence: true\n});\n```\n\nThat is the grown-up version of the idea.\n\nNot “my bot has my phone now.”\n\nMore like “the agent can operate inside a controlled mobile session with auditability.”\n\nThe market here is splitting into three lanes.\n\n| Option | What you actually get |\n|---|---|\n| DIY Appium on real iPhones | Maximum flexibility, maximum operational pain |\n| BrowserStack App Automate | Massive real-device QA infrastructure, testing-first workflow |\n| Browseblue-style agent layer | Persistent phone identity, approvals, logs, and agent-oriented sessions |\n\nThese are not the same thing.\n\nBrowserStack is great if your main job is app testing.\n\nA Browseblue-style system is more interesting if your job is giving an agent a durable mobile identity.\n\nDIY is still valid if you need total control and are willing to own the mess.\n\nIf I were prototyping this next week, I would not start with autonomous texting or high-risk flows.\n\nI would start here:\n\nGood examples:\n\nDo not wait until later to bolt on safety.\n\nMake “send,” “book,” and “buy” approval-gated from day one.\n\nYour agent loop should treat model calls as one component, not the whole architecture.\n\nAt minimum:\n\n```\nsession_id=iphone-123\ntimestamp=2026-05-27T10:15:00Z\naction=tap\ntarget=\"Messages compose button\"\nmodel=\"fast-cheap-model\"\napproval_required=false\nscreenshot=\"s3://.../step-14.png\"\n```\n\nThis part matters more than people expect.\n\nIf the workflow loops a lot, token-based pricing will show you exactly how expensive “just one more retry” becomes.\n\nThat is why teams running heavy automations often end up wanting a flat monthly bill instead of metered spend.\n\nI do not think the headline idea is “agents can use phones now.”\n\nThat is too obvious.\n\nThe real idea is that agents are starting to need persistent identities in the places humans actually work.\n\nNot just API keys.\n\nNot just browser sessions.\n\nReal phone numbers. Real app logins. Real saved state. Real approval history. Real continuity across days.\n\nThat is what makes the thread interesting.\n\nThe stack is hacky. The skeptics are right that UI automation is brittle. Native integrations are cleaner when available.\n\nAll true.\n\nI still think this points at something real.\n\nThe next useful agents will not just answer in Slack or Discord.\n\nThey will:\n\nMessy? Absolutely.\n\nBut so was every important interface layer before it became normal.\n\nAnd if that future shows up the way I think it will, the winners will not just have better agent loops.\n\nThey will have better cost control too.\n\nBecause once your agents move from chat to operation, especially on mobile, predictable compute stops being a nice-to-have and becomes part of the architecture.", "url": "https://wpnews.pro/news/i-found-the-r-openclaw-thread-with-27-upvotes-where-someone-gave-an-agent-a-real", "canonical_source": "https://dev.to/lars_winstand/i-found-the-ropenclaw-thread-with-27-upvotes-where-someone-gave-an-agent-a-real-iphone-and-now-i-20ab", "published_at": "2026-05-27 19:34:24+00:00", "updated_at": "2026-05-27 19:40:57.958993+00:00", "lang": "en", "topics": ["ai-agents", "ai-products", "ai-tools", "artificial-intelligence"], "entities": ["r/openclaw", "Appium", "iPhone", "XCUITest"], "alternates": {"html": "https://wpnews.pro/news/i-found-the-r-openclaw-thread-with-27-upvotes-where-someone-gave-an-agent-a-real", "markdown": "https://wpnews.pro/news/i-found-the-r-openclaw-thread-with-27-upvotes-where-someone-gave-an-agent-a-real.md", "text": "https://wpnews.pro/news/i-found-the-r-openclaw-thread-with-27-upvotes-where-someone-gave-an-agent-a-real.txt", "jsonld": "https://wpnews.pro/news/i-found-the-r-openclaw-thread-with-27-upvotes-where-someone-gave-an-agent-a-real.jsonld"}}