Testing Edge AI from an MCP tool: I pointed mk-qa-master at my webcam and YOLO answered

Jack Kao, developer of the MCP-native QA toolkit mk-qa-master, demonstrated a new edge AI testing capability by pointing a YOLOv8 object detection model at his laptop webcam through an RTSP stream. The v1.1 update enables developers to write deterministic tests for live camera feeds using the same MCP tool interface as browser-based testing, with assertions on Intersection over Union (IoU), throughput, and p95 latency. Kao orchestrated the entire pipeline — from webcam capture via ffmpeg to a looped RTSP stream served through mediamtx — without requiring specialized hardware like a Jetson.

By Jack Kao — author of mk-qa-master , an MCP-native QA toolkit. Most "AI testing" stops at calling an API and asserting the response isn't empty. Edge AI — a model running on a live camera feed — doesn't fit that mold. You can't assert exact bounding-box coordinates the output is fuzzy by design , and "correct but 200ms too late" is a production failure, not a pass. When I added the edge runner to mk-qa-master in v1.1, I wanted it to make that hard problem feel like writing any other test. So this post is me dogfooding it: my laptop webcam → an RTSP stream → a YOLOv8 model → assertions on IoU, throughput, and p95 latency — the whole thing orchestrated through MCP tool calls, no Jetson required. mk-qa-master is an MCP server: your AI client Claude Code, Cursor, etc. calls its tools to analyze, generate, and run tests. It already ships runners for pytest-playwright , jest , cypress , go , maestro , schemathesis … and as of v1.1, edge . Flip one env var: // the mk-qa-master MCP server config "env": { "QA RUNNER": "edge" } …and the same generate test / run tests tools now speak RTSP + YOLO instead of browsers. Same muscle memory, completely different domain. A useful Edge AI test answers all of these at once: Correctness is the subtle one. Two runs of the same model on the same frame can disagree by a few pixels, so detection correctness uses IoU Intersection over Union against an expected box with a threshold default 0.5 . You assert "the box overlaps enough," never "the box is identical." mk-qa-master ships this as edge.metrics.match detection . The testing layer wants a stream , not a device handle — so you bridge the camera: MacBook camera │ ffmpeg avfoundation capture ▼ local clip 5s — deterministic, replayable, no privacy surprises in CI │ ffmpeg -re -stream loop -1 loop forever ▼ mediamtx → rtsp://127.0.0.1:8600/cam │ OpenCV VideoCapture ▼ YOLOv8n LocalYolo → IoU / fps / latency assertions → pytest report Recording a clip once and looping it over RTSP is what makes this deterministic — the same frames every run, no "why did it fail at 6pm" lighting drama. The edge runner actually does this loop for you when QA RTSP SOURCE points at a file; I'll show the manual version because it's clearer: capture once ffmpeg -f avfoundation -framerate 30 -video size 1280x720 \ -i "0" -t 5 -pix fmt yuv420p edge demo.mp4 serve on loop via mediamtx mediamtx.yml: paths: { all others: } ffmpeg -re -stream loop -1 -i edge demo.mp4 \ -c:v libx264 -preset ultrafast -tune zerolatency \ -f rtsp rtsp://127.0.0.1:8600/cam This is the part I'm proud of — the whole session is just tools. 1. Confirm the runner. get runner info → { "current": "edge", "available": "cypress","edge","go","jest","maestro","newman","pytest-playwright","schemathesis" } 2. Probe the stream. analyze stream rtsp url="rtsp://127.0.0.1:8600/cam" → { "url": "rtsp://127.0.0.1:8600/cam", "width": 1280, "height": 720, "fps": 30.0, "labels": , "candidate tcs": "overall throughput should be = the configured min fps", "single-frame p95 latency should be <= the latency SLA", "stream reconnects after mid-test interruption without crashing", "empty / no-target frames do not generate false-positive detections" } Note it hands back candidate test cases specific to the edge domain — the same way analyze url hands back form/CTA test cases for web. 3. Generate the test. generate test ... writes a pytest skeleton wired to the edge fixtures backend , stream , latency . The body stays boring on purpose: python def test throughput stream, backend, latency : """Sustained-rate check: at least MIN FPS over a 150-frame window.""" n, t0 = 0, time.time while n < 150: ok, frame = stream.read if not ok: break latency.add backend.infer frame .latency ms n += 1 fps = n / max time.time - t0, 1e-6 assert fps = MIN FPS, f"throughput below target: {fps:.1f} < {MIN FPS} fps" def test detect target stream, backend, latency : """Target label within IoU threshold; p95 latency holds.""" hit = False for idx in range MAX FRAME : ok, frame = stream.read if not ok: break res = backend.infer frame latency.add res.latency ms for exp in annotations.get str idx , : if exp "label" == LABEL and match detection res.detections, exp, IOU : hit = True assert hit, f"{LABEL r} not detected within IoU={IOU}" assert latency.p95 <= SLA, f"p95 {latency.p95 :.1f}ms {SLA}ms" 4. Run it. run tests → the runner starts the RTSP source, loads YOLOv8n once session-scoped — reloading per test would wreck wall-clock , runs pytest, and snapshots the report. YOLOv8n on CPU no GPU, just the MacBook , 150 frames of the looped 720p stream: | Metric | Result | Target desktop yolov8n | |---|---|---| | Throughput | 23.0 fps | 25 — close, CPU-bound | | p95 latency | 27.9 ms | ≤ 40 ms ✅ | | mean / p50 latency | 26.0 / 25.6 ms | — | person detections | 150 / 150 frames | — | | Empty-frame false positives | 0 | 0 ✅ | The model nailed person on every frame it was pointed at me . It also confidently reported a suitcase in all 150 frames — there wasn't one. That's the reality of a tiny model: great recall, noisy precision. Exactly the thing a real test surfaces and a "looks like it works" demo hides. p95 of 27.9ms beat the 40ms SLA comfortably; 23fps fell just short of 25 purely for lack of a GPU. mk-qa-master ships SLA starting points so you don't guess: | Scenario | min fps | latency SLA | IoU | |---|---|---|---| | Desktop yolov8n dev | 25 | 40 ms | 0.5 | | Jetson Nano | 15 | 70 ms | 0.5 | | Jetson Orin Nano | 30 | 25 ms | 0.6 | | Cloud GPU service | 60 | 16 ms | 0.6 | test empty frame no false positives .Eating my own dog food turned up sharp edges I'm filing down: connect to the RTSP port?" — but I had an Android emulator qemu squatting on the default port 8554, so the probe passed DESCRIBE 404 . v1.2 moves readiness to an actual RTSP DESCRIBE success, and the docs now recommend a dedicated port. localhost is two addresses. :: and the consumer resolves 127.0.0.1 , they miss each other. Pinning both sides to 127.0.0.1 removes the ambiguity — going into the defaults. generate html report builds per-test cards from the runner's get all test details ; the edge runner doesn't override it yet, so an edge run shows summary tiles but no case list. Quick fix, already queued. QA JETSON HOST / QA INFERENCE ENDPOINT will let the same test point at a real device or a GPU service and just re-tune the thresholds. pip install "mk-qa-master edge " opencv + ultralytics + torch add the MCP server to your client, set QA RUNNER=edge, point QA RTSP SOURCE at a clip or an rtsp:// url, and call run tests You don't need edge hardware to start testing edge models — a webcam, ffmpeg, mediamtx, and a 6MB YOLOv8n checkpoint get you a real, asserting, repeatable test in an afternoon. When the Jetson arrives, you point the same test at a different endpoint. If you build CV/edge pipelines, I'd genuinely like to hear how you handle the latency-vs-correctness trade-off in CI — that tension is the whole game. — Jack Kao