# Built a physical AI agent device that controls any connected device via USB HID, and sharing the demo here

> Source: <https://discuss.huggingface.co/t/built-a-physical-ai-agent-device-that-controls-any-connected-device-via-usb-hid-and-sharing-the-demo-here/176700#post_3>
> Published: 2026-06-16 09:54:39+00:00

Hey guys

I wanted to share something we’ve been building, **a physical (hardware) AI agent device called Aiden AI** that takes a different approach to the “agent controls a computer” problem.

Most software agents need to be installed on the host device, require API access, or need admin permissions. We kept running into this wall.

Our approach is connecting as a standard USB HID device (same protocol as a keyboard and mouse). The host sees a keyboard and a mouse. The agent watches the screen via HDMI, listens through full-duplex audio, and sends inputs back through USB HID (no install, no API, no permissions needed on the host side).

Built on Luckfox Pico Zero (RV1106) with a Go-based LLM agent runtime, Silero VAD for on-device voice detection, memory system, skills system, and OTA updates.

Full architecture: [deepwiki.com/AidenAI-IO/aiden-hardware-demo](http://deepwiki.com/AidenAI-IO/aiden-hardware-demo)

HF Space: [ huggingface.co/blog/NatalieY/aiden-hardware-ai-agent-device](https://huggingface.co/blog/NatalieY/aiden-hardware-ai-agent-device)

Would be great to hear from anyone working on similar edge AI agent deployment challenges, especially around on-device inference and USB HID control.

Really like this. “Don’t ask the host for permission - just present as a keyboard and mouse” is exactly the right lateral move: the integration wall is where most agent deployments quietly die, and meeting it at the HID layer instead of the API layer sidesteps the whole problem. Nice work.

I’m heads-down on my own thing right now (latent-space work - injecting into and reading out of a model’s hidden state rather than going through text), so I genuinely don’t know what kind of help you’re after, if any. But I’ve been circling this same corner - autonomous agents on cheap edge silicon - for a while, so a few threads in case any are worth chewing on:

-
The loop latency, not the HID. HID output is basically free; what sets the felt responsiveness is the observe->reason cycle - HDMI frame → JPEG → LLM round-trip, every turn. What’s your end-to-end latency per action, and is the LLM the bottleneck? I’d be curious whether you frame-diff so an unchanged screen doesn’t pay for a fresh round-trip - seems like the obvious lever for a screen-driven ReAct loop.

-
Where does the on-device line land? You’ve already got Silero VAD running local on the RV1106 NPU, which is great; the reasoning still leans cloud. My whole rabbit hole is running capable models on tiny/fixed-point silicon, so I’m naturally curious how far onto the NPU you think the loop can move - privacy, offline, and cost all pull that way even if the heavy reasoning stays remote.

-
USB/IP. You’re already on the Linux gadget stack - tunnelling the HID gadget over usbip would decouple the brain from the physical endpoint (one agent driving a fleet of hosts, or the endpoint living somewhere else entirely). Felt like a natural marriage with what you’re doing.

-
Security, in both directions. The universality that makes this powerful is also the classic BadUSB surface - a keyboard that types on its own - so attestation/detection becomes an interesting co-problem. And the reverse: the agent acts on what it reads off the screen, which quietly makes the screen itself a prompt-injection channel. Not a knock, just the rich (and slightly scary) part of the design.

On squeezing silicon: I had a video bookmarked of someone wringing absurd performance out of bare-metal RISC-V that I’ve annoyingly lost, but the nearest kindred spirit I can point at is Hazard3 ([GitHub - Wren6991/Hazard3: 3-stage RV32IMACZb* processor with debug · GitHub](https://github.com/Wren6991/Hazard3)) - apt given the RV1106 already has a RISC-V core sitting right next to the A7 and the NPU. If I dig the original up I’ll drop it in here.

Anyway - really like the direction, and I think this field is about to get very rich. I’ll be watching. Best of luck.

Interesting idea. The USB HID approach is smart since it avoids installs, APIs, and permission issues. I’m curious about latency, reliability, and how you handle mistakes when the agent is controlling a real system. The edge AI side of this is definitely a challenging and interesting space.