Building a Drone-Based Security Reconnaissance System with Computer Vision A developer built a drone-based security reconnaissance system that uses computer vision to detect restricted-zone breaches. The system relays PTZ camera and drone feeds through MediaMTX, samples frames every two seconds, and runs RF-DETR inference via Roboflow to detect people and cars. A confirmed breach auto-dispatches the drone to track the intruder without human intervention until the alert fires. Build a drone-based security system by relaying every PTZ camera and drone feed through MediaMTX, sampling one frame every two seconds, and POSTing it to a Roboflow Inference Server running RF-DETR to detect people and cars. Use supervision's PolygonZone plus ByteTrack to confirm a sustained restricted-zone breach, which auto-dispatches the drone to track the intruder, with no per-frame inference and no human in the loop until the alert fires. Fixed security cameras have a frustrating property: they only see where you bolted them. Cover a real perimeter a yard, a lot, a fence line and you're either buying a dozen cameras and a wiring crew, or paying a person to walk it at 2 a.m. Both are expensive, and both leave gaps. A drone flips that math. One aircraft can patrol an arc that would take six fixed cameras, then land and charge. But a drone streaming video is just an expensive pair of eyes unless something is watching the feed and deciding what matters. That "something" is computer vision. This post walks through a working build that ties it all together: two pan-tilt-zoom PTZ cameras and one drone, all running live object detection, feeding a single dashboard that knows the difference between "a person walked by" and "someone is standing inside the restricted zone, launch the drone." Drone-Based Security System: The Architecture The whole system is a handful of small services that each do one job: System architecture: cameras and drone stream over RTSP into a MediaMTX relay, which sends WebRTC video to the dashboard and frames to the FastAPI workers; workers post frames to the RF-DETR inference server, and the zone/event logic dispatches the drone, records a clip, and pushes events to the dashboard. A confirmed breach auto-dispatches the drone. A few design choices worth calling out: Browsers can't play RTSP , so every camera and the drone publish into MediaMTX , which relays each stream out as low-latency WebRTC for the dashboard and as RTSP for the backend to pull frames from. One relay, three streams. Detection is a separate service. The Python workers never run a model in-process. They POST frames to Roboflow'sover HTTP and get detections back. That keeps the CV concern isolated and swappable. Inference Server Every service owns its own event stream a ring buffer + a WebSocket . The frontend subscribes to all of them and merges one chronological, severity-colored log. No shared database, no message broker, so services stay loosely coupled. Step 1: Get frames, not just video The cameras and drone stream continuously, but you do not want to run a detector on every frame. It's wasteful and, on a CPU, impossible to keep up with. So each camera worker grabs a frame on a fixed cadence INFERENCE INTERVAL SEC=2.0 by default, one inference every two seconds and sends just that one off for detection. Before sending, the worker shrinks the frame to a 640px longest side and JPEG-encodes it at quality 80: RF-DETR resizes to a small square internally, so downscaling here loses little accuracy but cuts encode + base64 + transfer + decode latency a lot. INFER MAX SIDE = 640 ok, buf = cv2.imencode ".jpg", send, cv2.IMWRITE JPEG QUALITY, 80 This is the single highest-leverage latency optimization in the build. The model downsizes the image anyway, so a 4K frame buys you nothing but slower JPEG encoding, a bigger base64 payload, and a slower decode on the server side. Send 640px and scale the returned boxes back up to the original frame. CODE: the full infer method, including scaling detections back to original resolution Step 2: The model, and why we didn't train one yet The detector is RF-DETR https://rfdetr.roboflow.com/latest/?ref=blog.roboflow.com rfdetr-base , a transformer-based object detection model that Roboflow runs out of the box with the standard COCO classes, which already include person and car. We filter the results down to just those two classes downstream, so the model itself stays generic: MODEL ID=rfdetr-base TARGET CLASSES=person,car CONFIDENCE THRESHOLD=0.5 For perimeter security, off-the-shelf person/car detection gets you surprisingly far, and starting here means zero labeling before you have a working system. MODEL ID is an environment variable end-to-end , so the moment you need something COCO doesn't cover a "person carrying a bag," a "delivery van vs. private car," aerial-angle people who look nothing like ground-level training data , you point it at a fine-tuned model with a one-line config change. That fine-tuning path is the natural Roboflow loop: Collect real footage from your own cameras and drone. The most valuable dataset is the one shot from your angles and altitudes. Label it in Roboflow Annotate , or bootstrap labels with a model you already have and just correct them. Pull in Universe datasets for classes you don't want to label from scratch. Aerial person/vehicle datasets are a strong head start for drone-altitude views. Train a fine-tuned RF-DETR, then deploy it back through the same Inference Server. No pipeline rewrite. Step 3: Deploy the detector The Inference Server runs as a container straight from roboflow/inference-server-cpu. The PTZ worker calls one route: POST /infer/object detection { "model id": "rfdetr-base", "image": { "type": "base64", "value": "