RK3576 + Hailo-8: 12x AI Performance Boost

wpnews.pro

Edge AI devices are often constrained by their built-in compute capabilities. The Seeed Studio reComputer RK3576, powered by Rockchip's RK3576 processor, features a respectable 6 TOPS NPU. However, when running YOLOv11n for video object detection, it achieves only ~2.3 FPS – far from usable for real-time applications.

This project addresses that bottleneck by integrating the Hailo-8 M.2 AI Accelerator Module, which delivers an additional 26 TOPS of compute power. The result? We pushed YOLOv11n inference to 28.1 FPS – a 12x+ performance uplift – enabling smooth, real-time video analysis.

The solution leverages:

reComputer RK3576
Hailo-8 AI Accelerator
Docker-based deployment
AI Lab model workflow
YOLO object detection

The entire system runs locally at the edge without relying on cloud services, making it suitable for:

Smart surveillance
Industrial inspection
Traffic monitoring
Robotics perception
Smart retail analytics

Before deploying the AI application, the Hailo-8 AI accelerator must be installed into the M.2 PCIe expansion slot of the reComputer RK3576.

The reComputer RK3576 provides an internal M.2 expansion interface that enables high-speed PCIe communication with AI accelerators such as the Hailo-8.

Step1：Open the reComputer RK3576 Enclosure

To access the internal expansion interface, remove the enclosure cover according to the hardware guide.

After opening the chassis, the mainboard and expansion interfaces become accessible for hardware upgrades and AI accelerator installation.

Step2：Install the Hailo-8 Accelerator

Align the Hailo-8 M.2 connector with the PCIe slot and insert it at a slight angle.

Push the module gently into the connector until it is fully seated.

The RK3576 already includes an NPU for AI workloads. However, when running multiple object detection streams or higher frame-rate video, additional acceleration becomes beneficial.

Hailo-8 provides dedicated AI inference acceleration that can significantly increase throughput while maintaining low power consumption. Hailo's architecture is specifically optimized for edge AI vision workloads.

System Architecture

USB Camera      │      ▼reComputer RK3576      │      ▼Hailo-8 Accelerator      │      ▼YOLO Inference Engine      │      ▼Bounding Box Visualization      │      ▼Local Display / Web StreamUSB Camera      │      ▼reComputer RK3576      │      ▼Hailo-8 Accelerator      │      ▼YOLO Inference Engine      │      ▼Bounding Box Visualization      │      ▼Local Display

Deploying the Hailo Software PackageStep 1: Access the Hailo Software Download Center

Visit the Hailo website and search for "Software Downloads" using the search bar.

Step 2: Select Your Hardware Platform

On the software download page, choose the appropriate hardware platform. For this project, select Hailo-8.

Step 3: Download the Required Software Packages

Download the required software packages for your operating system, including:

HailoRT Runtime
PCIe Driver
Model Zoo (optional)
Development Tools and SDK (if needed)

Make sure to download the versions that are compatible with your target platform and operating system before proceeding with the installation.

Transfer Required Files to reComputer RK3576Open Windows Terminal or PowerShell on your PC and transfer the required installation packages to the reComputer RK3576 using SCP.

scp C:\Users\seeed\Downloads\hailort-pcie-driver_4.24.0_all.deb seeed@192.168.10.230:/home/seeed/scp C:\Users\seeed\Downloads\hailort-4.24.0-cp311-cp311-linux_aarch64.whl seeed@192.168.10.230:/home/seeed/scp C:\Users\seeed\Downloads\librknnrt.so seeed@192.168.10.230:/home/seeed/

The files will be copied from the Windows host to the home directory of the RK3576 device.

Install Hailo Runtime and PCIe DriverOpen a terminal on the reComputer RK3576 and execute the following commands to install the Hailo PCIe driver, runtime environment, and Python SDK.

If the final verification command successfully detects the accelerator, the Hailo-8 PCIe device has been installed correctly and is ready for AI model deployment.

Install the Hailo Model ZooNext, install the official Hailo Model Zoo toolkit. This toolkit provides utilities for down, compiling, converting, and running pretrained AI models optimized for Hailo accelerators, including object detection, image classification, and segmentation models.

After installation, the Model Zoo tools can be used to download and deploy AI models directly onto the Hailo-8 accelerator.

Verify Camera DetectionBefore running inference, verify that the camera is correctly detected by the operating system.

v4l2-ctl --list-devices

If the camera is connected successfully, it should appear in the device list.

Figure X. Camera detection result on the reComputer RK3576.

To deploy YOLO11n on the Hailo-8 accelerator, first download and compile the model using the Hailo Model Zoo tools.

Run the following command inside the hailo_model_zoo

directory with the virtual environment activated:

hailomz compile yolov11n

After the compilation process is completed, the generated yolov11n.hef

model file can be used for real-time AI inference on the Hailo-8 accelerator.

Create a new Python file named:

webcam_yolo11n.py

Copy the following code into the file:

import numpy as npimport cv2import timefrom hailo_platform import (VDevice, HEF, InferVStreams, ConfigureParams,                            HailoStreamInterface, InputVStreamParams, OutputVStreamParams)# ================= Configuration =================HEF_PATH = 'yolov11n.hef'DEVICE_ID = "/dev/video0"  # Update based on v4l2-ctl outputCONF_THRESHOLD = 0.45# COCO Dataset 80 Class LabelsCOCO_CLASSES = [    "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",    "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",    "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",    "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",    "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",    "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",    "hair drier", "toothbrush"]# ==================================================def main():    # 1. Initialize Hailo Hardware    hef = HEF(HEF_PATH)    input_vstream_info = hef.get_input_vstream_infos()[0]    input_h, input_w = input_vstream_info.shape[:2]    cap = cv2.VideoCapture(DEVICE_ID)    if not cap.isOpened():        print("Cannot open webcam")        return    # Setup inference variables    prev_time = 0    with VDevice() as target:        config_params_dict = ConfigureParams.create_from_hef(hef, HailoStreamInterface.PCIe)        network_group = target.configure(hef, config_params_dict)[0]        with network_group.activate():            vstream_params = (InputVStreamParams.make_from_network_group(network_group),                              OutputVStreamParams.make_from_network_group(network_group))            with InferVStreams(network_group, vstream_params[0], vstream_params[1]) as vstreams:                print("[INFO] Initialization successful! Running YOLOv11n real-time detection...")                while True:                    start_time = time.time()  # Record start time for FPS                    ret, frame = cap.read()                    if not ret:                        break                    # Preprocessing (Convert to RGB based on previous validation)                    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)                    resized = cv2.resize(frame_rgb, (input_w, input_h))                    input_tensor = np.expand_dims(resized, axis=0)                    # Inference                    outputs = vstreams.infer(input_tensor)                    # Parsing and Drawing                    h, w, _ = frame.shape                    for name, class_list in outputs.items():                        # Iterate through 80 classes                        for class_id, detections in enumerate(class_list[0]):                            if len(detections) > 0:                                for det in detections:                                    if len(det) >= 5:                                        ymin, xmin, ymax, xmax, confidence = det[:5]                                        if confidence > CONF_THRESHOLD:                                            # Coordinate Mapping                                            left, top = int(xmin * w), int(ymin * h)                                            right, bottom = int(xmax * w), int(ymax * h)                                            # Get class name, display ID if out of bounds                                            class_name = COCO_CLASSES[class_id] if class_id < len(COCO_CLASSES) else f"ID {class_id}"                                            # Draw bounding box                                            cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 2)                                            # Draw background and label text                                            label = f"{class_name}: {confidence:.2f}"                                            cv2.putText(frame, label, (left, top - 10),                                                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)                    # Calculate and display real-time FPS                    curr_time = time.time()                    fps = 1 / (curr_time - start_time)                    # Print in the top left corner                    cv2.putText(frame, f"FPS: {fps:.1f}", (20, 40),                                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)                    # Display window                    cv2.imshow('reComputer RK3576 - Hailo YOLOv11n', frame)                    if cv2.waitKey(1) & 0xFF == ord('q'):                        break    cap.release()    cv2.destroyAllWindows()if __name__ == "__main__":    main()

running result：

Then create a Python file named webcam_npu_save

to run the model on the built-in NPU of RK3576. The code is as follows:

import numpy as npimport cv2import timeimport osfrom rknnlite.api import RKNNLite# ================= Configuration =================RKNN_MODEL_PATH = 'yolo11n.rknn'  # RKNN 模型路径DEVICE_ID = 0  # 摄像头设备号，对应 /dev/video0CONF_THRESHOLD = 0.45OUTPUT_DIR = "npu_detection_results"os.makedirs(OUTPUT_DIR, exist_ok=True)# COCO Dataset 80 Class LabelsCOCO_CLASSES = [    "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",    "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",    "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",    "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",    "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",    "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",    "hair drier", "toothbrush"]# ==================================================def post_process(outputs, frame, conf_threshold):    """解码 YOLOv8 风格的原始输出（解耦头）"""    h, w = frame.shape[:2]    detections = []        # 定义三个尺度的特征图尺寸    scales = [(80, 80), (40, 40), (20, 20)]        # 提取输出 (索引对应关系)    # 0,3,6: 回归 (reg) -> [64, 80, 80] 等    # 1,4,7: 分类 (cls) -> [80, 80, 80] 等    # 2,5,8: 目标性 (obj) -> [1, 80, 80] 等    reg_outputs = [outputs[0], outputs[3], outputs[6]]    cls_outputs = [outputs[1], outputs[4], outputs[7]]    obj_outputs = [outputs[2], outputs[5], outputs[8]]    # 遍历三个尺度    for reg, cls, obj, (h_feat, w_feat) in zip(reg_outputs, cls_outputs, obj_outputs, scales):        # 将张量展平并调整维度顺序        reg = reg.squeeze(0).transpose(1, 2, 0).reshape(-1, 64)  # [num_boxes, 64]        cls = cls.squeeze(0).transpose(1, 2, 0).reshape(-1, 80)  # [num_boxes, 80]        obj = obj.squeeze(0).transpose(1, 2, 0).reshape(-1, 1)   # [num_boxes, 1]        # 对每个特征点进行解码        for i in range(reg.shape[0]):            # 1. 目标性分数            obj_conf = obj[i][0]            if obj_conf < conf_threshold:                continue                        # 2. 分类分数            cls_scores = cls[i] * obj_conf  # 分类分数 * 目标性分数            class_id = np.argmax(cls_scores)            confidence = cls_scores[class_id]            if confidence < conf_threshold:                continue            # 3. 解码边界框 (YOLO 格式)            # 获取特征图中的网格坐标            row = i // w_feat            col = i % w_feat                        # 从回归头中提取 x, y, w, h 的偏移量            reg_vals = reg[i]            dx, dy, dw, dh = reg_vals[0], reg_vals[1], reg_vals[2], reg_vals[3]                        # 计算中心点坐标和宽高 (在特征图上的归一化坐标)            cx = (col + dx) / w_feat            cy = (row + dy) / h_feat            width = dw            height = dh                        # 转换为原图坐标            left = int((cx - width / 2) * w)            top = int((cy - height / 2) * h)            right = int((cx + width / 2) * w)            bottom = int((cy + height / 2) * h)                        # 边界检查            left = max(0, left)            top = max(0, top)            right = min(w, right)            bottom = min(h, bottom)                        class_name = COCO_CLASSES[class_id] if class_id < len(COCO_CLASSES) else f"ID {class_id}"            detections.append((class_name, float(confidence), left, top, right, bottom))        # 移除多余代码（因为这里没有使用 NMS，但建议保留）    # 注意：如果检测框过多，可以考虑保留 NMS 逻辑    return detectionsdef main():    # 1. 初始化 RKNN    rknn = RKNNLite()        # 加载模型    print(f"[INFO]  RKNN model from {RKNN_MODEL_PATH}...")    ret = rknn.load_rknn(RKNN_MODEL_PATH)    if ret != 0:        print(f"[ERROR] Load RKNN model failed: {ret}")        return        # 初始化运行时    print("[INFO] Initializing RKNN runtime...")    ret = rknn.init_runtime()    if ret != 0:        print(f"[ERROR] Init runtime failed: {ret}")        return    print("[INFO] RKNN model loaded successfully")        # 2. 打开摄像头    cap = cv2.VideoCapture(DEVICE_ID)    if not cap.isOpened():        print("Cannot open webcam")        return        print(f"[INFO] Camera opened. Model input: 640x640")    print("[INFO] Press Ctrl+C to stop.")        frame_count = 0    save_interval = 5  # 每5帧保存一张        try:        while True:            start_time = time.time()            ret, frame = cap.read()            if not ret:                break                        # 预处理            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)            resized = cv2.resize(frame_rgb, (640, 640))            input_tensor = np.expand_dims(resized, axis=0).astype(np.float32) / 255.0                        # 推理            outputs = rknn.inference(inputs=[input_tensor])                        # 后处理            detections = post_process(outputs, frame, CONF_THRESHOLD)                        # 计算并显示 FPS            fps = 1 / (time.time() - start_time)            cv2.putText(frame, f"NPU FPS: {fps:.1f}", (20, 40),                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)                        # 每 N 帧保存一张图片（带检测框和FPS）            if frame_count % save_interval == 0:                img_path = os.path.join(OUTPUT_DIR, f"npu_frame_{frame_count:06d}.jpg")                cv2.imwrite(img_path, frame)                print(f"[Frame {frame_count}] Saved -> {img_path}")                        # 终端打印检测信息            if detections:                print(f"[Frame {frame_count}] Found {len(detections)} objects, FPS: {fps:.1f}")                for cls, conf, l, t, r, b in detections[:3]:  # 最多打印3个                    print(f"  - {cls}: {conf:.2f} at ({l},{t})-({r},{b})")                        frame_count += 1            time.sleep(0.01)                except KeyboardInterrupt:        print("\n[INFO] Stopped by user.")        cap.release()    print(f"[INFO] Done. Total frames: {frame_count}")if __name__ == "__main__":    main()

running result：

RK3576 built-in NPU: YOLOv11n only achieves 2.3 FPS with extremely poor real-time performance, making smooth video detection impossible.

Hailo-8 PCIe accelerator card: The same model reaches 28.1 FPS, delivering over tenfold frame rate improvement and meeting the real-time inference requirements of standard video streams.

This project demonstrates how the reComputer RK3576 and Hailo-8 accelerator can be combined to build a powerful edge AI vision platform capable of real-time object detection without cloud dependency.The combination of Rockchip computing resources, Hailo acceleration, and containerized deployment provides an accessible path for developers building next-generation intelligent edge applications.

[Read more](javascript:void(0))

source & further reading

hackster.io — original article A Robot Arm with Eyes, Ears & Brain — Runs on MacBook Edge AI Power 6: Measuring the Power Efficiency of MemryX Train a Dual-Arm Nero Reach Task in Isaac Lab

RK3576 + Hailo-8: 12x AI Performance Boost

Run your AI side-project on zahid.host