Cloud voice assistants are convenient until the internet drops, latency creeps in, or you wonder where every command is processed. For a smart home, "turn on the office light" should feel immediate and private.
I wanted a Home Assistant voice assistant that could live on my network and do the important work locally. The result is a Docker Compose stack that turns a Rockchip RK3576 board into a local voice backend for Assist.
It brings together local wake word detection, local speech-to-text, local text-to-speech, and a small local LLM. Home Assistant still sees normal integrations. The RK3576 board handles the accelerated AI pieces behind the scenes.
The stack includes:
- openWakeWord for wake word detection
- Whisper for speech-to-text
- Piper for text-to-speech
- Qwen 2.5 1.5B through an OpenAI-compatible RKLLM API
- Wyoming services for Home Assistant integration
- Prebuilt ARM64 containers for Docker Compose
- RK3576 acceleration through Rockchip's NPU stack
After setup, the flow feels like a normal Home Assistant voice assistant.
You say the wake word. Home Assistant opens the Assist pipeline. Whisper on the RK3576 transcribes the command. Home Assistant handles the intent, or sends a conversation request to the local LLM. Piper on the RK3576 speaks the response.
The important part is what does not happen: Home Assistant does not need to know about RKNN models, NPU runtimes, model packaging, or board-specific audio handling. It talks to standard Wyoming services.
The Docker Compose stack exposes four local services:
-
Piper text-to-speech on port
10200 -
Whisper speech-to-text on port
10300 -
openWakeWord on port
10400 -
RKLLM local conversation API on port
8001
Home Assistant already has a strong local voice ecosystem, so I did not want to replace its architecture. I wanted to give it a compact local backend that could run the heavier AI workloads on hardware designed for edge inference.
The RK3576 is interesting because it sits in a useful middle ground. It is small enough to behave like a home appliance, but it includes a Rockchip NPU that can accelerate real models when they are prepared correctly.
My goals were simple:
- Keep the voice pipeline local
- Use RK3576 acceleration where it matters
- Make deployment practical with Docker Compose
- Keep Home Assistant hardware-agnostic
That last point shaped the whole project. Hardware-specific projects often become difficult to reproduce because every layer knows too much about the board. I wanted the opposite: Home Assistant should see ordinary services, while the containers handle the RK3576 details.
ArchitectureAt a high level, the voice path looks like this:
Voice input -> openWakeWord -> Home Assistant Assist -> Wyoming Whisper on RK3576 -> Home Assistant intent or local LLM -> Wyoming Piper on RK3576 -> Spoken response
Wyoming is the key boundary. It gives Home Assistant a clean protocol for speech services, and it gives this project a place to hide the messy parts: RKNN execution, model , hardware access, and container packaging.
Instead of teaching Home Assistant about the RK3576, I built Wyoming-compatible services around the accelerated models. That keeps the integration boring in the best way. You add the services in Home Assistant, select them in an Assist pipeline, and Home Assistant treats them like any other local STT, TTS, and wake word provider.
Why RK3576This project is not trying to build a cloud-scale assistant on a tiny board. The goal is local smart-home control and short local conversations.
That makes the RK3576 a good fit. Its NPU can help with the AI workloads, while the board remains small, quiet, and easy to keep on the network. The default LLM is Qwen 2.5 1.5B, which is intentionally modest. For this use case, reliability and local execution matter more than having the largest possible model.
The RK3576 also makes the project more interesting from an engineering standpoint. Running models on an NPU is not the same as running generic Python code on an ARM CPU. The model format, runtime, input shapes, and device access all matter. The challenge was to make those details disappear from the user's point of view.
The full setup lives in the GitHub repository, including Docker Compose files, model packaging notes, and troubleshooting steps. The short version is: prepare the board, start the Compose stack, then add the local services to Home Assistant.
Prebuilt ARM64 images are what make this practical. Users should not need to convert models, build containers on the board, or manually arrange large model files before trying the project.
1. Prepare the RK3576 board
Install a Linux ARM64 image, connect the board to the network, then install Docker Engine and the Docker Compose plugin. The RKNN-based containers need access to the Rockchip accelerator, so the Compose file gives the speech and LLM services the required device access.
docker --versiondocker compose version
2. Clone the project
git clone https://github.com/Hanzo-Huang/rk3576-home-assistant-voice.gitcd rk3576-home-assistant-voice
3. Start the voice services
sudo docker compose up -d --pull alwayssudo docker compose pssudo docker compose logs -f
Use this path if Home Assistant already runs elsewhere on the network. You should see containers for Whisper, Piper, openWakeWord, and RKLLM.
4. Optional: run Home Assistant on the same board
sudo docker compose --profile homeassistant up -d --pull always
Then open:
http://RK3576_BOARD_IP:8123
**5.**Add the Wyoming services in Home Assistant
In Home Assistant, go to Settings -> Devices & services -> Add integration -> Wyoming Protocol
, then add:
-
Whisper speech-to-text:
RK3576_BOARD_IP:10300 -
Piper text-to-speech:
RK3576_BOARD_IP:10200 -
openWakeWord:
RK3576_BOARD_IP:10400
6. Create the Assist pipeline
Go to Settings -> Voice assistants
, then create or edit an Assist pipeline and select the RK3576 Whisper, Piper, and openWakeWord services.
7. Add the local LLM
Install the Local LLM integration through HACS, then configure it as an OpenAI-compatible endpoint:
Backend: OpenAI Compatible Conversations APIAPI hostname: RK3576_BOARD_IPAPI port: 8001API path: /v1API key: sk-localModel name: rkllm-model
Select this conversation agent in the Assist pipeline.
8. Test it
Try commands such as:
- "Turn on the office light."
- "What lights are on?"
- "Set the desk lamp to 30 percent."
- "What time is it?"
Watch logs while testing:
sudo docker compose logs -f whisper piper openwakeword llm
I still need to publish final benchmark numbers, but these are the measurements I would track for a useful Home Assistant voice experience:
- Whisper transcription:
0.626
seconds
for a typical room-control command - LLM response:
2.82 seconds
for LLM response. - Piper synthesis:
0.474 seconds
from text response to playable audio
The important engineering target is not only raw model speed. The whole loop needs to feel responsive: wake word detection should trigger quickly, transcription should not stall the Assist pipeline, and TTS should start speaking soon enough that the interaction still feels conversational.
The first challenge was Whisper. Speech-to-text has a lot of practical audio handling around the model itself, and the RK3576 path adds another layer: the model needs to run through RKNN on the NPU. The service had to accept normal Home Assistant audio, prepare it for the accelerated model, and return a transcript through Wyoming.
Piper had a different shape. The text processing and encoder can stay close to the normal Python and ONNX flow, but the decoder runs through RKNN. That required wrapping the accelerated path so Home Assistant still receives ordinary streamed audio instead of seeing any of the model-specific constraints.
The third challenge was the interface. It would have been easy to build something that worked only as a custom demo, but that was not the goal. Wyoming made the project feel native to Home Assistant while keeping the RK3576-specific code inside the containers.
Packaging was just as important as inference. Large model files do not belong directly in the source tree, but a project like this is not useful if every user has to assemble them manually. The build workflow downloads release archives, verifies checksums, prepares the assets, and publishes ARM64 images that can be pulled by Docker Compose.
What I LearnedThe most useful design choice was keeping the boundary strict. Home Assistant remains the orchestrator. The RK3576 board provides local AI services. Wyoming connects the two without leaking hardware details across the line.
That separation also makes the project easier to change. A different RK3576-ready Whisper model, another Piper voice, or a different RKLLM image can be swapped behind the same service shape. The user-facing integration stays the same.
The other lesson was that packaging is part of the engineering, not an afterthought. A working prototype is interesting. A Compose stack with prebuilt containers is something another maker can actually try.
ResultThe RK3576 board becomes a local voice backend for Home Assistant: wake word, STT, TTS, and local conversation all run on the edge, while Home Assistant continues to use standard integrations.
The takeaway is the interface. Home Assistant stays clean and hardware-agnostic, while the RK3576 does the specialized acceleration behind Wyoming services. That is the balance I wanted: a voice assistant that feels like part of the home, but still remains open, local, and hackable.
Future Improvements- Add more Piper voices
- Add more RK3576-ready Whisper model sizes
- Benchmark latency across the Assist pipeline
- Build a dedicated voice satellite enclosure
- Add microphone and speaker wiring notes
- Compare RKLLM models for smart-home commands and short conversations
[Read more](javascript:void(0))