Real-time YOLOv8n UAV detection at the sensor's 46 FPS ceiling, in ~140 MB of RAM. A high-throughput, low-footprint computer-vision pipeline for the Rockchip RK3588S SoC: it captures live 1080p MIPI frames, runs YOLOv8n across all 3 NPU cores in parallel (lifting throughput from ~31 to 46 FPS β the camera, not the pipeline, is now the limit), and streams the annotated result to HDMI or RTSP. Capture, color-convert/resize and inference run entirely on fixed-function silicon (ISP, RGA, NPU), so the CPU stays free and memory holds flat at ~140 MB per stream β small enough to run on even the cheapest 2 GB RK3588S boards, not just high-end dev kits. Targets any RK3588S board; built and tested on the Khadas Edge2.
Then it goes a step further: when a tracked UAV leaves the scene, an on-device LLM (Qwen2.5-0.5B, on the same NPU) writes a natural-language assessment of what just happened. The whole thing is a chain of small, independent processes connected by Unix-domain sockets β detections flow downstream into multi-object tracking, temporal-feature extraction, a presence FSM, and the on-demand LLM summary.
Highlights
Saturates the sensor: 3-thread NPU inference lifts throughput from ~31 FPS to the 46 FPS camera ceiling β the pipeline is no longer the bottleneck.Fully hardware-accelerated: capture (ISP), color-convert/resize (RGA), and inference (NPU) never touch the CPU, giving a flat ~140 MB RSS per stream.Runs on any RK3588S board: because the footprint is so small (~140 MB for one stream, ~290 MB for two), it fits comfortably on the cheapest RK3588S boards on the market β even 2 GB models that sell foras little as ~β¬90β not just high-end dev kits.** Two cameras at once:independent per-device sockets let two streams run and be controlled side by side. Composable pipeline:detection β ByteTrack β temporal features β presence FSM β on-demand LLM summary, each a separate process. NPU hand-off for the LLM:**ablackout
/resume
control plane frees the whole NPU so the LLM runs at full speed, then hands it back to the cameras.
Target hardware: any RK3588S-based board, aarch64 Linux, with an OS08A10 MIPI camera. Developed and tested on the Khadas Edge2. Cross-compiles from x86-64/WSL or builds natively on the board.
For the full software architecture (Mermaid diagrams of the internal pipeline and the multi-process topology) see docs/architecture.md; for launch commands see docs/usage.md.
Related repositories
β the entire pipeline for training, converting, and exporting the YOLO model into the Rockchip NPURKNN_TRAIN_YOLO.rknn
format used here.β the entire pipeline for running optimized LLM models on the RK3588S, either on the NPU (RKLLM) or the CPU (llama).RKLLM_LLAMA_QWEN
A 3-thread inference pool runs one RKNN context per NPU core
(rknn_dup_context
rknn_set_core_mask
), pipelining capture, inference, and display across all three cores. At 1080p with YOLOv8n 640Γ640 this lifts throughput from ~31.2 FPS (naΓ―ve single-threaded loop) to the 46 FPS OS08A10 camera ceiling β the pipeline is no longer the bottleneck, the sensor is. Full per-model FPS, latency, and CPU/NPU/RAM numbers are in docs/benchmarks.md.
Every heavy per-frame operation runs on a dedicated fixed-function block of the
RK3588S (camera ISP, RGA, NPU), never on the CPU β so there are no large
intermediate framebuffers or scratch tensors CPU-side. A fixed pool of
pre-allocated buffers (N_BUF
, see BufPool
in src/main.cc) is recycled instead of allocating per frame, so memory stays
flat and bounded:
~137β152 MB RSS for one 1080p stream,
~276β304 MB for two(and that double-counts the shared
librknnrt.so
/ librga.so
pages).Because the NPU, ISP and RGA are identical across the whole RK3588S range, the same binary runs at full speed on the cheapest 2 GB boards (~β¬90) β no 8/16 GB dev kit required. See docs/architecture.md for the per-frame offload table and pipeline diagram.
Native (on the board):
cd yolov8n_cap_multithread
bash build.sh
Cross-compile (WSL / x86-64 Linux):
sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
bash setup_sdk.sh # fetches librga v1.10.5_[8] + librknnrt v2.3.2
cd yolov8n_cap_multithread
bash build.sh # uses toolchain-aarch64.cmake (aarch64-linux-gnu-g++)
scp -r install/yolov8n_cap_multithread/ khadas@<board-ip>:~/programs/
Run: ./yolov8n_cap_multithread <rknn model> <device number> <rtsp port | hdmi>
See docs/usage.md for launch commands, and docs/usage_advanced.md for the IPC control/data plane, the downstream tracking/temporal/LLM stages, and RTSP streaming setup.
yolov8n_cap_multithread/
βββ CMakeLists.txt # builds main pipeline + all auxiliary processes
βββ build.sh # convenience wrapper around CMake
βββ toolchain-aarch64.cmake # cross-compile toolchain (WSL / x86 β aarch64)
βββ data/
β βββ coco_1_labels_list.txt
β βββ model/ # .rknn model files
β
βββ include/ # YOLO pipeline headers
β βββ camera_util.h
β βββ drm_func.h
β βββ local_display.h # HDMI output via DRM / Wayland
β βββ model_utils.h
β βββ postprocess.h # YOLOv8 decode + NMS
β βββ rga_func.h # Rockchip RGA color-space conversion / resize
β βββ rtsp_stream.h # GStreamer RTSP publisher
β βββ ipc/ # shared IPC layer (control + data planes)
β βββ bounded_queue.h # drop-oldest queue used by all publishers
β βββ i_control_server.h
β βββ i_data_publisher.h
β βββ messages.h # in-process DetectionMessage type
β βββ unix_control_server.h
β βββ unix_data_publisher.h
β βββ wire_protocol.h # ALL on-the-wire structs + socket paths
β βββ yolo_control_state.h
β
βββ src/ # YOLO pipeline implementation
β βββ main.cc # multi-threaded RKNN pipeline (3 NPU cores)
β βββ camera_util.cc
β βββ local_display.cc
β βββ model_utils.cc
β βββ postprocess.cc
β βββ rga_func.cc
β βββ rtsp_stream.cc
β βββ ipc/
β βββ unix_control_server.cc # JSON control plane over AF_UNIX
β βββ unix_data_publisher.cc # binary detection stream over AF_UNIX
β
βββ tracker/ # ByteTrack stage (separate process)
β βββ include/
β β βββ bytetrack_adapter.h # IByteTracker interface
β βββ src/
β βββ bytetrack_service.cc # main() β reads data, writes tracks
β βββ iou_tracker.cc # default IOU-greedy implementation
β
βββ temporal/ # Temporal-features stage (separate process)
β βββ include/
β β βββ track_state.h # per-track history + feature math
β β βββ track_manager.h # lifecycle + per-frame orchestration
β βββ src/
β βββ temporal_service.cc # main() β reads tracks, writes events
β βββ track_state.cc
β βββ track_manager.cc
β
βββ tools/ # Standalone client / debug binaries
β βββ control_client.cc # send /resume/blackout/status commands
β βββ data_receiver.cc # consume raw detections (yolo_data socket)
β βββ tracks_receiver.cc # consume tracked dets (yolo_tracks socket)
β βββ events_receiver.cc # consume temporal events (yolo_events socket)
β βββ event_summarizer.cc # presence FSM + on-demand LLM (production sink)
β
βββ utility_board_scripts/ # board-side helpers (deployed to install tree)
β βββ run_qwen.sh # feeds a snapshot to Qwen2.5-0.5B via llm_demo
β
βββ build/ # CMake out-of-source build tree
βββ install/ # `make install` deploy tree (scp this to board)
βββ yolov8n_cap_multithread/
βββ yolov8n_cap_multithread
βββ bytetrack_service
βββ temporal_service
βββ control_client
βββ data_receiver
βββ tracks_receiver
βββ events_receiver
βββ event_summarizer
βββ data/ # models + labels
βββ utility_board_scripts/ # run_qwen.sh
βββ lib/ # librknnrt.so, librga.so
Each stage is an independent OS process; they communicate via per-device
Unix-domain sockets (<device>
= V4L2 device number, e.g. 33
). The full
software architecture β the internal main.cc
pipeline and the multi-process topology, both as Mermaid diagrams β is documented in docs/architecture.md.
Licensed under the Apache License 2.0 β see LICENSE.
This is an independent, personal project built for educational and research purposes only. It is not affiliated with or endorsed by any employer or client of the author, and is not intended for production, operational, safety-critical, surveillance, or defense use. The "UAV" class is only a sample detection target for benchmarking the inference pipeline. The software is provided "AS IS", without warranty of any kind, and you are solely responsible for complying with all applicable export-control and other regulations. See DISCLAIMER.md for the full text.