The one that came up most in real usage: testers frequently need to trigger deeplinks to verify specific app states — product detail pages, notification payloads, OAuth redirects. The old workflow always involved a mobile developer — either having them trigger it on their machine or building a debug menu inside the app specifically for this purpose.
In v0.3.0 you can now fire a deeplink directly from the QA session toolbar. Click the link icon (or ⌘K
), enter the URL, and it executes on the active device.
Under the hood it's a new open-url
WebSocket message type that routes browser → relay → agent:
Browser ──open-url──► Relay ──open-url──► Mac Agent
│
iOS: xcrun simctl openurl booted <url>
Android: adb shell am start -a VIEW -d <url>
Browser ◄──open-url:done/error── Relay ◄──────┘
The DeviceAgent
interface got a new openUrl(url)
method, so both iOS and Android agents implement it symmetrically. The relay routes it and returns either open-url:done
or open-url:error
with the failure reason. The dashboard shows a toast either way.
QA sessions are repetitive. Reaching for the toolbar icons on every screenshot or rotation adds up. v0.3.0 adds keyboard shortcuts to all the common actions:
| Shortcut | Action |
|---|---|
⌘K |
|
| Open deeplink dialog | |
⌘S |
|
| Take screenshot | |
⌘⇧Y |
|
| Start / stop recording | |
⌘⇧O |
|
| Rotate simulator | |
⌘⇧U |
|
| iOS: press Home | |
⌘⇧K |
|
| iOS: toggle software keyboard |
Tooltips now show the shortcut hint inline, so they're discoverable without reading docs. One implementation detail worth noting: key detection uses e.code
instead of e.key
. This matters for IME input — Korean, Japanese, and Chinese users composing text would otherwise trigger shortcuts mid-composition.
This one unlocks a new class of CI usage.
GET /api/v1/sessions/:sessionId/screenshot
returns a PNG or JPEG of the current simulator screen. You can call it with a PAT token from any CI step — before asserting a visual state, during an automated flow, after a build install.
The tricky part was the request/response pattern. The relay communicates with agents over WebSocket (long-lived, multiplexed), but HTTP is request/response. Screenshots are taken on the Mac, not the relay.
We introduced a requestId-based pending map: the relay generates a unique ID, sends a take-screenshot
message to the agent over WebSocket, registers a promise keyed by requestId, and resolves it when screenshot:result
comes back. The HTTP handler awaits that promise and sends the binary payload:
GET /api/v1/sessions/:id/screenshot
│
▼
Relay: generate requestId, push to pending map
│
├──screenshot-request──► Mac Agent
│ │ simctl io screenshot (iOS)
│ │ ADB screencap (Android)
◄──screenshot:result─────────┘
│
▼
HTTP 200 (binary image)
iOS supports both PNG and JPEG via --type
. Android returns PNG regardless — ADB doesn't offer format selection at this layer.
Personal Access Tokens existed before v0.3.0, but the scope field wasn't actually enforced on API routes. A developer
scoped token could call any endpoint.
v0.3.0 adds proper scope checks to all builds endpoints. PATs are now enforced at the middleware layer: a token issued for builds
access can upload and manage builds, but can't touch team settings or session data. This makes it safe to issue narrow tokens for CI pipelines without giving them broader access than they need.
For anyone debugging streaming latency: v0.3.x adds per-frame hop timestamps via a binary header (TFFE
— tapflow frame envelope). Each frame now carries the capture time, relay-received time, and client-received time in an 8-byte prefix before the JPEG/H.264 payload.
The dashboard can surface a live performance overlay showing frame latency broken down by segment (agent → relay, relay → browser). Useful when diagnosing whether a slowdown is in the network leg or the browser decode path.
The last item in v0.3.x is different in nature. It shipped as @tapflowio/mcp-server
at 0.3.1-experimental.1
— the version suffix says what we mean.
The MCP server wraps tapflow's WebSocket and REST APIs as 12 MCP tools:
list_devices, connect_device, boot_device, screenshot,
tap, swipe, type_text, press_key, press_button,
install_app, launch_app, disconnect_device
This lets any MCP-compatible LLM client control a running simulator the same way a human would through the browser — but programmatically, from a model. Connect it to Claude Desktop or a coding agent, and the model can tap through flows, take screenshots to verify state, and install builds.
Why experimental? The core works, but the tool layer needs more hardening. Device state management, timing edge cases, and error recovery paths aren't reliable enough yet — the same input doesn't always produce predictable behavior. We're still working toward the point where you can trust it to do the right thing consistently.
If you want to try it:
npm install -g @tapflowio/mcp-server
Configure it as an MCP server in your client, point it at your tapflow relay with a PAT token, and the simulator tools show up in the model's tool list.
The MCP server is step one. The direction we're aiming at is using it as the foundation for LLM-driven test automation in CI/CD pipelines — where a model installs a fresh build, walks through critical flows, takes screenshots at each step, and reports pass/fail without a human in the loop.
That's a bigger topic. We'll write it up separately once the MCP layer is stable enough to build on.
npm install -g tapflow
tapflow start