{"slug": "tapflow-v0-3-x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental", "title": "tapflow v0.3.x: Deeplinks, Keyboard Shortcuts, Screenshot API, and an Experimental MCP Server", "summary": "Tapflow v0.3.x introduces deeplink execution directly from the QA session toolbar, allowing testers to trigger URLs on active devices without developer involvement. The release also adds keyboard shortcuts for common actions, a screenshot API for CI pipelines, and proper scope enforcement for personal access tokens. An experimental MCP server is included, along with per-frame hop timestamps for debugging streaming latency.", "body_md": "The one that came up most in real usage: testers frequently need to trigger deeplinks to verify specific app states — product detail pages, notification payloads, OAuth redirects. The old workflow always involved a mobile developer — either having them trigger it on their machine or building a debug menu inside the app specifically for this purpose.\n\nIn v0.3.0 you can now fire a deeplink directly from the QA session toolbar. Click the link icon (or `⌘K`\n\n), enter the URL, and it executes on the active device.\n\nUnder the hood it's a new `open-url`\n\nWebSocket message type that routes browser → relay → agent:\n\n```\nBrowser ──open-url──► Relay ──open-url──► Mac Agent\n                                              │\n                           iOS: xcrun simctl openurl booted <url>\n                           Android: adb shell am start -a VIEW -d <url>\nBrowser ◄──open-url:done/error── Relay ◄──────┘\n```\n\nThe `DeviceAgent`\n\ninterface got a new `openUrl(url)`\n\nmethod, so both iOS and Android agents implement it symmetrically. The relay routes it and returns either `open-url:done`\n\nor `open-url:error`\n\nwith the failure reason. The dashboard shows a toast either way.\n\nQA sessions are repetitive. Reaching for the toolbar icons on every screenshot or rotation adds up. v0.3.0 adds keyboard shortcuts to all the common actions:\n\n| Shortcut | Action |\n|---|---|\n`⌘K` |\nOpen deeplink dialog |\n`⌘S` |\nTake screenshot |\n`⌘⇧Y` |\nStart / stop recording |\n`⌘⇧O` |\nRotate simulator |\n`⌘⇧U` |\niOS: press Home |\n`⌘⇧K` |\niOS: toggle software keyboard |\n\nTooltips now show the shortcut hint inline, so they're discoverable without reading docs. One implementation detail worth noting: key detection uses `e.code`\n\ninstead of `e.key`\n\n. This matters for IME input — Korean, Japanese, and Chinese users composing text would otherwise trigger shortcuts mid-composition.\n\nThis one unlocks a new class of CI usage.\n\n`GET /api/v1/sessions/:sessionId/screenshot`\n\nreturns a PNG or JPEG of the current simulator screen. You can call it with a PAT token from any CI step — before asserting a visual state, during an automated flow, after a build install.\n\nThe tricky part was the request/response pattern. The relay communicates with agents over WebSocket (long-lived, multiplexed), but HTTP is request/response. Screenshots are taken on the Mac, not the relay.\n\nWe introduced a requestId-based pending map: the relay generates a unique ID, sends a `take-screenshot`\n\nmessage to the agent over WebSocket, registers a promise keyed by requestId, and resolves it when `screenshot:result`\n\ncomes back. The HTTP handler awaits that promise and sends the binary payload:\n\n```\nGET /api/v1/sessions/:id/screenshot\n    │\n    ▼\nRelay: generate requestId, push to pending map\n    │\n    ├──screenshot-request──► Mac Agent\n    │                            │ simctl io screenshot (iOS)\n    │                            │ ADB screencap (Android)\n    ◄──screenshot:result─────────┘\n    │\n    ▼\nHTTP 200 (binary image)\n```\n\niOS supports both PNG and JPEG via `--type`\n\n. Android returns PNG regardless — ADB doesn't offer format selection at this layer.\n\nPersonal Access Tokens existed before v0.3.0, but the scope field wasn't actually enforced on API routes. A `developer`\n\nscoped token could call any endpoint.\n\nv0.3.0 adds proper scope checks to all builds endpoints. PATs are now enforced at the middleware layer: a token issued for `builds`\n\naccess can upload and manage builds, but can't touch team settings or session data. This makes it safe to issue narrow tokens for CI pipelines without giving them broader access than they need.\n\nFor anyone debugging streaming latency: v0.3.x adds per-frame hop timestamps via a binary header (`TFFE`\n\n— tapflow frame envelope). Each frame now carries the capture time, relay-received time, and client-received time in an 8-byte prefix before the JPEG/H.264 payload.\n\nThe dashboard can surface a live performance overlay showing frame latency broken down by segment (agent → relay, relay → browser). Useful when diagnosing whether a slowdown is in the network leg or the browser decode path.\n\nThe last item in v0.3.x is different in nature. It shipped as `@tapflowio/mcp-server`\n\nat `0.3.1-experimental.1`\n\n— the version suffix says what we mean.\n\nThe MCP server wraps tapflow's WebSocket and REST APIs as 12 MCP tools:\n\n```\nlist_devices, connect_device, boot_device, screenshot,\ntap, swipe, type_text, press_key, press_button,\ninstall_app, launch_app, disconnect_device\n```\n\nThis lets any MCP-compatible LLM client control a running simulator the same way a human would through the browser — but programmatically, from a model. Connect it to Claude Desktop or a coding agent, and the model can tap through flows, take screenshots to verify state, and install builds.\n\nWhy experimental? The core works, but the tool layer needs more hardening. Device state management, timing edge cases, and error recovery paths aren't reliable enough yet — the same input doesn't always produce predictable behavior. We're still working toward the point where you can trust it to do the right thing consistently.\n\nIf you want to try it:\n\n```\nnpm install -g @tapflowio/mcp-server\n```\n\nConfigure it as an MCP server in your client, point it at your tapflow relay with a PAT token, and the simulator tools show up in the model's tool list.\n\nThe MCP server is step one. The direction we're aiming at is using it as the foundation for LLM-driven test automation in CI/CD pipelines — where a model installs a fresh build, walks through critical flows, takes screenshots at each step, and reports pass/fail without a human in the loop.\n\nThat's a bigger topic. We'll write it up separately once the MCP layer is stable enough to build on.\n\n```\nnpm install -g tapflow\ntapflow start\n# http://localhost:4000\n```\n\n", "url": "https://wpnews.pro/news/tapflow-v0-3-x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental", "canonical_source": "https://dev.to/joduchan/tapflow-v03x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental-mcp-server-4lg1", "published_at": "2026-05-29 07:38:22+00:00", "updated_at": "2026-05-29 07:41:09.800474+00:00", "lang": "en", "topics": ["ai-tools", "ai-products", "ai-infrastructure", "ai-agents", "mlops"], "entities": ["tapflow", "iOS", "Android", "Mac Agent", "DeviceAgent", "WebSocket", "xcrun", "adb"], "alternates": {"html": "https://wpnews.pro/news/tapflow-v0-3-x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental", "markdown": "https://wpnews.pro/news/tapflow-v0-3-x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental.md", "text": "https://wpnews.pro/news/tapflow-v0-3-x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental.txt", "jsonld": "https://wpnews.pro/news/tapflow-v0-3-x-deeplinks-keyboard-shortcuts-screenshot-api-and-an-experimental.jsonld"}}