RK3576 + Hailo-8: 12x AI Performance Boost

Seeed Studio integrated a Hailo-8 M.2 AI accelerator with the reComputer RK3576, boosting YOLOv11n inference from 2.3 FPS to 28.1 FPS—a 12x performance increase—enabling real-time edge video analysis for surveillance, inspection, and robotics without cloud dependency.

Edge AI devices are often constrained by their built-in compute capabilities. The Seeed Studio reComputer RK3576, powered by Rockchip's RK3576 processor, features a respectable 6 TOPS NPU. However, when running YOLOv11n for video object detection, it achieves only ~2.3 FPS – far from usable for real-time applications. This project addresses that bottleneck by integrating the Hailo-8 M.2 AI Accelerator Module, which delivers an additional 26 TOPS of compute power. The result? We pushed YOLOv11n inference to 28.1 FPS – a 12x+ performance uplift – enabling smooth, real-time video analysis. The solution leverages: - reComputer RK3576 - Hailo-8 AI Accelerator - Docker-based deployment - AI Lab model workflow - YOLO object detection The entire system runs locally at the edge without relying on cloud services, making it suitable for: - Smart surveillance - Industrial inspection - Traffic monitoring - Robotics perception - Smart retail analytics Before deploying the AI application, the Hailo-8 AI accelerator must be installed into the M.2 PCIe expansion slot of the reComputer RK3576. The reComputer RK3576 provides an internal M.2 expansion interface that enables high-speed PCIe communication with AI accelerators such as the Hailo-8. Step1：Open the reComputer RK3576 Enclosure To access the internal expansion interface, remove the enclosure cover according to the hardware guide. After opening the chassis, the mainboard and expansion interfaces become accessible for hardware upgrades and AI accelerator installation. Step2：Install the Hailo-8 Accelerator Align the Hailo-8 M.2 connector with the PCIe slot and insert it at a slight angle. Push the module gently into the connector until it is fully seated. The RK3576 already includes an NPU for AI workloads. However, when running multiple object detection streams or higher frame-rate video, additional acceleration becomes beneficial. Hailo-8 provides dedicated AI inference acceleration that can significantly increase throughput while maintaining low power consumption. Hailo's architecture is specifically optimized for edge AI vision workloads. System Architecture USB Camera │ ▼reComputer RK3576 │ ▼Hailo-8 Accelerator │ ▼YOLO Inference Engine │ ▼Bounding Box Visualization │ ▼Local Display / Web StreamUSB Camera │ ▼reComputer RK3576 │ ▼Hailo-8 Accelerator │ ▼YOLO Inference Engine │ ▼Bounding Box Visualization │ ▼Local Display Deploying the Hailo Software Package Step 1: Access the Hailo Software Download Center Visit the Hailo website and search for "Software Downloads" using the search bar. Step 2: Select Your Hardware Platform On the software download page, choose the appropriate hardware platform. For this project, select Hailo-8. Step 3: Download the Required Software Packages Download the required software packages for your operating system, including: - HailoRT Runtime - PCIe Driver - Model Zoo optional - Development Tools and SDK if needed Make sure to download the versions that are compatible with your target platform and operating system before proceeding with the installation. Transfer Required Files to reComputer RK3576Open Windows Terminal or PowerShell on your PC and transfer the required installation packages to the reComputer RK3576 using SCP. scp C:\Users\seeed\Downloads\hailort-pcie-driver 4.24.0 all.deb seeed@192.168.10.230:/home/seeed/scp C:\Users\seeed\Downloads\hailort-4.24.0-cp311-cp311-linux aarch64.whl seeed@192.168.10.230:/home/seeed/scp C:\Users\seeed\Downloads\librknnrt.so seeed@192.168.10.230:/home/seeed/ The files will be copied from the Windows host to the home directory of the RK3576 device. Install Hailo Runtime and PCIe DriverOpen a terminal on the reComputer RK3576 and execute the following commands to install the Hailo PCIe driver, runtime environment, and Python SDK. Install the PCIe driversudo dpkg -i hailort-pcie-driver 4.24.0 all.deb Reboot the systemsudo reboot After reboot, verify the driver is loadedlsmod | grep hailo Install HailoRTsudo dpkg -i hailort 4.24.0 arm64.deb Scan and verify device statushailortcli scan Create and activate a virtual environmentpython3 -m venv hailo envsource hailo env/bin/activate Install HailoRT Python librarypip install hailort-4.23.0-cp311-cp311-linux aarch64.whl Verify installation and device connectionpython3 -c "from hailo platform import VDevice; vdev = VDevice ; print 'Successfully connected via VDevice Device info:', vdev " If the final verification command successfully detects the accelerator, the Hailo-8 PCIe device has been installed correctly and is ready for AI model deployment. Install the Hailo Model ZooNext, install the official Hailo Model Zoo toolkit. This toolkit provides utilities for downloading, compiling, converting, and running pretrained AI models optimized for Hailo accelerators, including object detection, image classification, and segmentation models. 1. Install required system librariessudo apt updatesudo apt install -y git libglib2.0-0 libgl1-mesa-glx 2. Clone the official repository latest branch recommended git clone https://github.com/hailo-ai/hailo model zoo.gitcd hailo model zoopip install -e . After installation, the Model Zoo tools can be used to download and deploy AI models directly onto the Hailo-8 accelerator. Verify Camera DetectionBefore running inference, verify that the camera is correctly detected by the operating system. v4l2-ctl --list-devices If the camera is connected successfully, it should appear in the device list. Figure X. Camera detection result on the reComputer RK3576. To deploy YOLO11n on the Hailo-8 accelerator, first download and compile the model using the Hailo Model Zoo tools. Run the following command inside the hailo model zoo directory with the virtual environment activated: hailomz compile yolov11n After the compilation process is completed, the generated yolov11n.hef model file can be used for real-time AI inference on the Hailo-8 accelerator. Create a new Python file named: webcam yolo11n.py Copy the following code into the file: python import numpy as npimport cv2import timefrom hailo platform import VDevice, HEF, InferVStreams, ConfigureParams, HailoStreamInterface, InputVStreamParams, OutputVStreamParams ================= Configuration =================HEF PATH = 'yolov11n.hef'DEVICE ID = "/dev/video0" Update based on v4l2-ctl outputCONF THRESHOLD = 0.45 COCO Dataset 80 Class LabelsCOCO CLASSES = "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" ==================================================def main : 1. Initialize Hailo Hardware hef = HEF HEF PATH input vstream info = hef.get input vstream infos 0 input h, input w = input vstream info.shape :2 cap = cv2.VideoCapture DEVICE ID if not cap.isOpened : print "Cannot open webcam" return Setup inference variables prev time = 0 with VDevice as target: config params dict = ConfigureParams.create from hef hef, HailoStreamInterface.PCIe network group = target.configure hef, config params dict 0 with network group.activate : vstream params = InputVStreamParams.make from network group network group , OutputVStreamParams.make from network group network group with InferVStreams network group, vstream params 0 , vstream params 1 as vstreams: print " INFO Initialization successful Running YOLOv11n real-time detection..." while True: start time = time.time Record start time for FPS ret, frame = cap.read if not ret: break Preprocessing Convert to RGB based on previous validation frame rgb = cv2.cvtColor frame, cv2.COLOR BGR2RGB resized = cv2.resize frame rgb, input w, input h input tensor = np.expand dims resized, axis=0 Inference outputs = vstreams.infer input tensor Parsing and Drawing h, w, = frame.shape for name, class list in outputs.items : Iterate through 80 classes for class id, detections in enumerate class list 0 : if len detections 0: for det in detections: if len det = 5: ymin, xmin, ymax, xmax, confidence = det :5 if confidence CONF THRESHOLD: Coordinate Mapping left, top = int xmin w , int ymin h right, bottom = int xmax w , int ymax h Get class name, display ID if out of bounds class name = COCO CLASSES class id if class id < len COCO CLASSES else f"ID {class id}" Draw bounding box cv2.rectangle frame, left, top , right, bottom , 0, 255, 0 , 2 Draw background and label text label = f"{class name}: {confidence:.2f}" cv2.putText frame, label, left, top - 10 , cv2.FONT HERSHEY SIMPLEX, 0.5, 0, 255, 0 , 2 Calculate and display real-time FPS curr time = time.time fps = 1 / curr time - start time Print in the top left corner cv2.putText frame, f"FPS: {fps:.1f}", 20, 40 , cv2.FONT HERSHEY SIMPLEX, 1, 0, 0, 255 , 2 Display window cv2.imshow 'reComputer RK3576 - Hailo YOLOv11n', frame if cv2.waitKey 1 & 0xFF == ord 'q' : break cap.release cv2.destroyAllWindows if name == " main ": main running result： Then create a Python file named webcam npu save to run the model on the built-in NPU of RK3576. The code is as follows: python import numpy as npimport cv2import timeimport osfrom rknnlite.api import RKNNLite ================= Configuration =================RKNN MODEL PATH = 'yolo11n.rknn' RKNN 模型路径DEVICE ID = 0 摄像头设备号，对应 /dev/video0CONF THRESHOLD = 0.45OUTPUT DIR = "npu detection results"os.makedirs OUTPUT DIR, exist ok=True COCO Dataset 80 Class LabelsCOCO CLASSES = "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" ==================================================def post process outputs, frame, conf threshold : """解码 YOLOv8 风格的原始输出（解耦头）""" h, w = frame.shape :2 detections = 定义三个尺度的特征图尺寸 scales = 80, 80 , 40, 40 , 20, 20 提取输出 索引对应关系 0,3,6: 回归 reg - 64, 80, 80 等 1,4,7: 分类 cls - 80, 80, 80 等 2,5,8: 目标性 obj - 1, 80, 80 等 reg outputs = outputs 0 , outputs 3 , outputs 6 cls outputs = outputs 1 , outputs 4 , outputs 7 obj outputs = outputs 2 , outputs 5 , outputs 8 遍历三个尺度 for reg, cls, obj, h feat, w feat in zip reg outputs, cls outputs, obj outputs, scales : 将张量展平并调整维度顺序 reg = reg.squeeze 0 .transpose 1, 2, 0 .reshape -1, 64 num boxes, 64 cls = cls.squeeze 0 .transpose 1, 2, 0 .reshape -1, 80 num boxes, 80 obj = obj.squeeze 0 .transpose 1, 2, 0 .reshape -1, 1 num boxes, 1 对每个特征点进行解码 for i in range reg.shape 0 : 1. 目标性分数 obj conf = obj i 0 if obj conf < conf threshold: continue 2. 分类分数 cls scores = cls i obj conf 分类分数 目标性分数 class id = np.argmax cls scores confidence = cls scores class id if confidence < conf threshold: continue 3. 解码边界框 YOLO 格式 获取特征图中的网格坐标 row = i // w feat col = i % w feat 从回归头中提取 x, y, w, h 的偏移量 reg vals = reg i dx, dy, dw, dh = reg vals 0 , reg vals 1 , reg vals 2 , reg vals 3 计算中心点坐标和宽高 在特征图上的归一化坐标 cx = col + dx / w feat cy = row + dy / h feat width = dw height = dh 转换为原图坐标 left = int cx - width / 2 w top = int cy - height / 2 h right = int cx + width / 2 w bottom = int cy + height / 2 h 边界检查 left = max 0, left top = max 0, top right = min w, right bottom = min h, bottom class name = COCO CLASSES class id if class id < len COCO CLASSES else f"ID {class id}" detections.append class name, float confidence , left, top, right, bottom 移除多余代码（因为这里没有使用 NMS，但建议保留） 注意：如果检测框过多，可以考虑保留 NMS 逻辑 return detectionsdef main : 1. 初始化 RKNN rknn = RKNNLite 加载模型 print f" INFO Loading RKNN model from {RKNN MODEL PATH}..." ret = rknn.load rknn RKNN MODEL PATH if ret = 0: print f" ERROR Load RKNN model failed: {ret}" return 初始化运行时 print " INFO Initializing RKNN runtime..." ret = rknn.init runtime if ret = 0: print f" ERROR Init runtime failed: {ret}" return print " INFO RKNN model loaded successfully" 2. 打开摄像头 cap = cv2.VideoCapture DEVICE ID if not cap.isOpened : print "Cannot open webcam" return print f" INFO Camera opened. Model input: 640x640" print " INFO Press Ctrl+C to stop." frame count = 0 save interval = 5 每5帧保存一张 try: while True: start time = time.time ret, frame = cap.read if not ret: break 预处理 frame rgb = cv2.cvtColor frame, cv2.COLOR BGR2RGB resized = cv2.resize frame rgb, 640, 640 input tensor = np.expand dims resized, axis=0 .astype np.float32 / 255.0 推理 outputs = rknn.inference inputs= input tensor 后处理 detections = post process outputs, frame, CONF THRESHOLD 计算并显示 FPS fps = 1 / time.time - start time cv2.putText frame, f"NPU FPS: {fps:.1f}", 20, 40 , cv2.FONT HERSHEY SIMPLEX, 1, 0, 0, 255 , 2 每 N 帧保存一张图片（带检测框和FPS） if frame count % save interval == 0: img path = os.path.join OUTPUT DIR, f"npu frame {frame count:06d}.jpg" cv2.imwrite img path, frame print f" Frame {frame count} Saved - {img path}" 终端打印检测信息 if detections: print f" Frame {frame count} Found {len detections } objects, FPS: {fps:.1f}" for cls, conf, l, t, r, b in detections :3 : 最多打印3个 print f" - {cls}: {conf:.2f} at {l},{t} - {r},{b} " frame count += 1 time.sleep 0.01 except KeyboardInterrupt: print "\n INFO Stopped by user." cap.release print f" INFO Done. Total frames: {frame count}" if name == " main ": main running result： RK3576 built-in NPU: YOLOv11n only achieves 2.3 FPS with extremely poor real-time performance, making smooth video detection impossible. Hailo-8 PCIe accelerator card: The same model reaches 28.1 FPS, delivering over tenfold frame rate improvement and meeting the real-time inference requirements of standard video streams. This project demonstrates how the reComputer RK3576 and Hailo-8 accelerator can be combined to build a powerful edge AI vision platform capable of real-time object detection without cloud dependency.The combination of Rockchip computing resources, Hailo acceleration, and containerized deployment provides an accessible path for developers building next-generation intelligent edge applications. Read more javascript:void 0