The Engineering Insight
Most edge AI hardware projects fail to cross from proof-of-concept to reproducible deployment because hardware-specific details leak upward through every layer. Hanzo Huang's RK3576 stack avoids this by using Home Assistant's Wyoming protocol as a hard abstraction boundary: the Assist pipeline sees standard STT, TTS, and wake-word services over TCP; RKNN model , NPU device access, and Rockchip-specific packaging stay sealed inside the Docker containers. A practitioner can replicate this deployment without touching model conversion or board-specific runtimes.
What the Stack Does
The project is a Docker Compose stack turning a Rockchip RK3576 board into a local voice backend for Home Assistant. Four containerized services handle the pipeline: openWakeWord detects the wake phrase (port 10400), Wyoming Whisper handles speech-to-text (port 10300), Wyoming Piper handles text-to-speech (port 10200), and Qwen 2.5 1.5B served via an RKLLM-backed OpenAI-compatible API handles open-ended conversation (port 8001). Prebuilt ARM64 images mean users skip model format conversion entirely.
Latency Measurements
Huang reports per-stage timings for a typical smart-home command: Whisper transcription at 0.626 seconds, Piper synthesis at 0.474 seconds, and RKLLM response at 2.82 seconds. End-to-end pipeline benchmarks are still pending. For context, Home Assistant's official documentation notes Whisper on a Raspberry Pi 4 takes around 8 seconds per command on CPU, so the RK3576 NPU acceleration is meaningful even on these preliminary per-stage numbers.
Hardware Context
The RK3576 integrates a 6 TOPS dual-core NPU supporting INT4/INT8/FP16 inference. Vendor benchmarks place it at roughly 70% of the RK3588's performance at around 30% of its price - a cost-effective tier for always-on home appliances. The hardware used here is the Seeed Studio reComputer RK3576, paired with a reSpeaker XMOS XVF3800 microphone array.
Deployment Path
Clone the GitHub repo (github.com/Hanzo-Huang/rk3576-home-assistant-voice), run docker compose up -d --pull always, then add three Wyoming integrations in HA under Settings -> Devices & services -> Wyoming Protocol. The HACS Local LLM integration connects Qwen 2.5 1.5B as a conversation agent via the OpenAI-compatible endpoint. Home Assistant can optionally co-host on the same board via a Compose profile flag.
What to Watch
As sub-2B instruction-tuned models improve (Qwen 2.5, Phi-3.5-mini, Gemma-3 1B), the quality gap to cloud voice closes. The RK3576's INT4 support can approximately double inference speed for quantized models, which may push the 2.82s LLM latency into acceptable conversational range without a hardware upgrade. The Wyoming abstraction also means swapping in a different Whisper model size or Piper voice requires only an image update, not a Home Assistant reconfiguration.
Key Points #
- 1What: RK3576 NPU runs Whisper, Piper, openWakeWord, and Qwen 2.5 1.5B in Docker via Wyoming.
- 2Why: Enables fully local Home Assistant voice with no cloud, measuring 0.626s STT and 0.474s TTS latency.
- 3So what: Prebuilt ARM64 images and a strict Wyoming abstraction make this a reproducible edge voice stack.
Scoring Rationale #
Well-documented open-source Docker Compose stack combining Whisper, Piper, openWakeWord, and Qwen 2.5 1.5B on the RK3576 NPU for fully local Home Assistant voice; concrete latency data (0.626s STT, 0.474s TTS) and prebuilt ARM64 images make it a reproducible practitioner reference, but scope is a single-hardware maker project.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.