{"slug": "make-home-assistant-voice-fully-local-with-rk3576", "title": "Make Home Assistant Voice Fully Local with RK3576", "summary": "A developer created a Docker Compose stack that turns a Rockchip RK3576 board into a fully local voice backend for Home Assistant, using openWakeWord, Whisper, Piper, and a local LLM. The setup keeps all voice processing on-device, eliminating cloud dependency and latency while maintaining standard Home Assistant integrations.", "body_md": "Cloud voice assistants are convenient until the internet drops, latency creeps in, or you wonder where every command is processed. For a smart home, \"turn on the office light\" should feel immediate and private.\n\nI wanted a Home Assistant voice assistant that could live on my network and do the important work locally. The result is a Docker Compose stack that turns a Rockchip RK3576 board into a local voice backend for Assist.\n\nIt brings together local wake word detection, local speech-to-text, local text-to-speech, and a small local LLM. Home Assistant still sees normal integrations. The RK3576 board handles the accelerated AI pieces behind the scenes.\n\nThe stack includes:\n\n- openWakeWord for wake word detection\n- Whisper for speech-to-text\n- Piper for text-to-speech\n- Qwen 2.5 1.5B through an OpenAI-compatible RKLLM API\n- Wyoming services for Home Assistant integration\n- Prebuilt ARM64 containers for Docker Compose\n- RK3576 acceleration through Rockchip's NPU stack\n\nAfter setup, the flow feels like a normal Home Assistant voice assistant.\n\nYou say the wake word. Home Assistant opens the Assist pipeline. Whisper on the RK3576 transcribes the command. Home Assistant handles the intent, or sends a conversation request to the local LLM. Piper on the RK3576 speaks the response.\n\nThe important part is what does not happen: Home Assistant does not need to know about RKNN models, NPU runtimes, model packaging, or board-specific audio handling. It talks to standard Wyoming services.\n\nThe Docker Compose stack exposes four local services:\n\n- Piper text-to-speech on port\n`10200`\n\n- Whisper speech-to-text on port\n`10300`\n\n- openWakeWord on port\n`10400`\n\n- RKLLM local conversation API on port\n`8001`\n\nHome Assistant already has a strong local voice ecosystem, so I did not want to replace its architecture. I wanted to give it a compact local backend that could run the heavier AI workloads on hardware designed for edge inference.\n\nThe RK3576 is interesting because it sits in a useful middle ground. It is small enough to behave like a home appliance, but it includes a Rockchip NPU that can accelerate real models when they are prepared correctly.\n\nMy goals were simple:\n\n- Keep the voice pipeline local\n- Use RK3576 acceleration where it matters\n- Make deployment practical with Docker Compose\n- Keep Home Assistant hardware-agnostic\n\nThat last point shaped the whole project. Hardware-specific projects often become difficult to reproduce because every layer knows too much about the board. I wanted the opposite: Home Assistant should see ordinary services, while the containers handle the RK3576 details.\n\nArchitectureAt a high level, the voice path looks like this:\n\n``` php\nVoice input  -> openWakeWord  -> Home Assistant Assist  -> Wyoming Whisper on RK3576  -> Home Assistant intent or local LLM  -> Wyoming Piper on RK3576  -> Spoken response\n```\n\nWyoming is the key boundary. It gives Home Assistant a clean protocol for speech services, and it gives this project a place to hide the messy parts: RKNN execution, model loading, hardware access, and container packaging.\n\nInstead of teaching Home Assistant about the RK3576, I built Wyoming-compatible services around the accelerated models. That keeps the integration boring in the best way. You add the services in Home Assistant, select them in an Assist pipeline, and Home Assistant treats them like any other local STT, TTS, and wake word provider.\n\nWhy RK3576This project is not trying to build a cloud-scale assistant on a tiny board. The goal is local smart-home control and short local conversations.\n\nThat makes the RK3576 a good fit. Its NPU can help with the AI workloads, while the board remains small, quiet, and easy to keep on the network. The default LLM is Qwen 2.5 1.5B, which is intentionally modest. For this use case, reliability and local execution matter more than having the largest possible model.\n\nThe RK3576 also makes the project more interesting from an engineering standpoint. Running models on an NPU is not the same as running generic Python code on an ARM CPU. The model format, runtime, input shapes, and device access all matter. The challenge was to make those details disappear from the user's point of view.\n\nThe full setup lives in the GitHub repository, including Docker Compose files, model packaging notes, and troubleshooting steps. The short version is: prepare the board, start the Compose stack, then add the local services to Home Assistant.\n\nPrebuilt ARM64 images are what make this practical. Users should not need to convert models, build containers on the board, or manually arrange large model files before trying the project.\n\n**1. Prepare the RK3576 board**\n\nInstall a Linux ARM64 image, connect the board to the network, then install Docker Engine and the Docker Compose plugin. The RKNN-based containers need access to the Rockchip accelerator, so the Compose file gives the speech and LLM services the required device access.\n\n```\ndocker --versiondocker compose version\n```\n\n**2. Clone the project**\n\n```\ngit clone https://github.com/Hanzo-Huang/rk3576-home-assistant-voice.gitcd rk3576-home-assistant-voice\n```\n\n**3. Start the voice services**\n\n```\nsudo docker compose up -d --pull alwayssudo docker compose pssudo docker compose logs -f\n```\n\nUse this path if Home Assistant already runs elsewhere on the network. You should see containers for Whisper, Piper, openWakeWord, and RKLLM.\n\n**4. Optional: run Home Assistant on the same board**\n\n```\nsudo docker compose --profile homeassistant up -d --pull always\n```\n\nThen open:\n\n```\nhttp://RK3576_BOARD_IP:8123\n```\n\n**5.****Add the Wyoming services in Home Assistant**\n\nIn Home Assistant, go to `Settings -> Devices & services -> Add integration -> Wyoming Protocol`\n\n, then add:\n\n- Whisper speech-to-text:\n`RK3576_BOARD_IP:10300`\n\n- Piper text-to-speech:\n`RK3576_BOARD_IP:10200`\n\n- openWakeWord:\n`RK3576_BOARD_IP:10400`\n\n**6. Create the Assist pipeline**\n\nGo to `Settings -> Voice assistants`\n\n, then create or edit an Assist pipeline and select the RK3576 Whisper, Piper, and openWakeWord services.\n\n**7. Add the local LLM**\n\nInstall the Local LLM integration through HACS, then configure it as an OpenAI-compatible endpoint:\n\n```\nBackend: OpenAI Compatible Conversations APIAPI hostname: RK3576_BOARD_IPAPI port: 8001API path: /v1API key: sk-localModel name: rkllm-model\n```\n\nSelect this conversation agent in the Assist pipeline.\n\n**8. Test it**\n\nTry commands such as:\n\n- \"Turn on the office light.\"\n- \"What lights are on?\"\n- \"Set the desk lamp to 30 percent.\"\n- \"What time is it?\"\n\nWatch logs while testing:\n\n```\nsudo docker compose logs -f whisper piper openwakeword llm\n```\n\nI still need to publish final benchmark numbers, but these are the measurements I would track for a useful Home Assistant voice experience:\n\n- Whisper transcription:\n`0.626`\n\n`seconds`\n\nfor a typical room-control command - LLM response:\n`2.82 seconds`\n\nfor LLM response. - Piper synthesis:\n`0.474 seconds`\n\nfrom text response to playable audio\n\nThe important engineering target is not only raw model speed. The whole loop needs to feel responsive: wake word detection should trigger quickly, transcription should not stall the Assist pipeline, and TTS should start speaking soon enough that the interaction still feels conversational.\n\nThe first challenge was Whisper. Speech-to-text has a lot of practical audio handling around the model itself, and the RK3576 path adds another layer: the model needs to run through RKNN on the NPU. The service had to accept normal Home Assistant audio, prepare it for the accelerated model, and return a transcript through Wyoming.\n\nPiper had a different shape. The text processing and encoder can stay close to the normal Python and ONNX flow, but the decoder runs through RKNN. That required wrapping the accelerated path so Home Assistant still receives ordinary streamed audio instead of seeing any of the model-specific constraints.\n\nThe third challenge was the interface. It would have been easy to build something that worked only as a custom demo, but that was not the goal. Wyoming made the project feel native to Home Assistant while keeping the RK3576-specific code inside the containers.\n\nPackaging was just as important as inference. Large model files do not belong directly in the source tree, but a project like this is not useful if every user has to assemble them manually. The build workflow downloads release archives, verifies checksums, prepares the assets, and publishes ARM64 images that can be pulled by Docker Compose.\n\nWhat I LearnedThe most useful design choice was keeping the boundary strict. Home Assistant remains the orchestrator. The RK3576 board provides local AI services. Wyoming connects the two without leaking hardware details across the line.\n\nThat separation also makes the project easier to change. A different RK3576-ready Whisper model, another Piper voice, or a different RKLLM image can be swapped behind the same service shape. The user-facing integration stays the same.\n\nThe other lesson was that packaging is part of the engineering, not an afterthought. A working prototype is interesting. A Compose stack with prebuilt containers is something another maker can actually try.\n\nResultThe RK3576 board becomes a local voice backend for Home Assistant: wake word, STT, TTS, and local conversation all run on the edge, while Home Assistant continues to use standard integrations.\n\nThe takeaway is the interface. Home Assistant stays clean and hardware-agnostic, while the RK3576 does the specialized acceleration behind Wyoming services. That is the balance I wanted: a voice assistant that feels like part of the home, but still remains open, local, and hackable.\n\nFuture Improvements- Add more Piper voices\n- Add more RK3576-ready Whisper model sizes\n- Benchmark latency across the Assist pipeline\n- Build a dedicated voice satellite enclosure\n- Add microphone and speaker wiring notes\n- Compare RKLLM models for smart-home commands and short conversations\n\n[Read more](javascript:void(0))", "url": "https://wpnews.pro/news/make-home-assistant-voice-fully-local-with-rk3576", "canonical_source": "https://www.hackster.io/h1300923175/make-home-assistant-voice-fully-local-with-rk3576-50b4de", "published_at": "2026-06-22 10:36:23+00:00", "updated_at": "2026-06-29 02:59:56.763558+00:00", "lang": "en", "topics": ["ai-infrastructure", "ai-products", "developer-tools", "ai-ethics"], "entities": ["Home Assistant", "Rockchip RK3576", "Whisper", "Piper", "Qwen 2.5", "openWakeWord", "RKLLM", "Wyoming"], "alternates": {"html": "https://wpnews.pro/news/make-home-assistant-voice-fully-local-with-rk3576", "markdown": "https://wpnews.pro/news/make-home-assistant-voice-fully-local-with-rk3576.md", "text": "https://wpnews.pro/news/make-home-assistant-voice-fully-local-with-rk3576.txt", "jsonld": "https://wpnews.pro/news/make-home-assistant-voice-fully-local-with-rk3576.jsonld"}}