By Jack Kao β author of
[mk-qa-master], an MCP-native QA toolkit.
Most "AI testing" stops at calling an API and asserting the response isn't empty. Edge AI β a model running on a live camera feed β doesn't fit that mold. You can't assert exact bounding-box coordinates (the output is fuzzy by design), and "correct but 200ms too late" is a production failure, not a pass.
When I added the edge runner to mk-qa-master in v1.1, I wanted it to make that hard problem feel like writing any other test. So this post is me dogfooding it: my laptop webcam β an RTSP stream β a YOLOv8 model β assertions on IoU, throughput, and p95 latency β the whole thing orchestrated through MCP tool calls, no Jetson required.
mk-qa-master is an MCP server: your AI client (Claude Code, Cursor, etc.) calls its tools to analyze, generate, and run tests. It already ships runners for pytest-playwright
, jest
, cypress
, go
, maestro
, schemathesis
β¦ and as of v1.1, edge
.
Flip one env var:
// the mk-qa-master MCP server config
"env": { "QA_RUNNER": "edge" }
β¦and the same generate_test
/ run_tests
tools now speak RTSP + YOLO instead of browsers. Same muscle memory, completely different domain.
A useful Edge AI test answers all of these at once:
Correctness is the subtle one. Two runs of the same model on the same frame can disagree by a few pixels, so detection correctness uses IoU (Intersection over Union) against an expected box with a threshold (default 0.5). You assert "the box overlaps enough," never "the box is identical." mk-qa-master ships this as edge.metrics.match_detection
.
The testing layer wants a stream, not a device handle β so you bridge the camera:
MacBook camera
β ffmpeg (avfoundation capture)
βΌ
local clip (5s β deterministic, replayable, no privacy surprises in CI)
β ffmpeg -re -stream_loop -1 (loop forever)
βΌ
mediamtx β rtsp://127.0.0.1:8600/cam
β OpenCV VideoCapture
βΌ
YOLOv8n (LocalYolo) β IoU / fps / latency assertions β pytest report
Recording a clip once and looping it over RTSP is what makes this deterministic β the same frames every run, no "why did it fail at 6pm" lighting drama. The edge runner actually does this loop for you when QA_RTSP_SOURCE
points at a file; I'll show the manual version because it's clearer:
ffmpeg -f avfoundation -framerate 30 -video_size 1280x720 \
-i "0" -t 5 -pix_fmt yuv420p edge_demo.mp4
ffmpeg -re -stream_loop -1 -i edge_demo.mp4 \
-c:v libx264 -preset ultrafast -tune zerolatency \
-f rtsp rtsp://127.0.0.1:8600/cam
This is the part I'm proud of β the whole session is just tools.
1. Confirm the runner. get_runner_info
β
{ "current": "edge", "available": ["cypress","edge","go","jest","maestro","newman","pytest-playwright","schemathesis"] }
2. Probe the stream. analyze_stream(rtsp_url="rtsp://127.0.0.1:8600/cam")
β
{
"url": "rtsp://127.0.0.1:8600/cam",
"width": 1280, "height": 720, "fps": 30.0,
"labels": [],
"candidate_tcs": [
"overall throughput should be >= the configured min_fps",
"single-frame p95 latency should be <= the latency SLA",
"stream reconnects after mid-test interruption without crashing",
"empty / no-target frames do not generate false-positive detections"
]
}
Note it hands back candidate test cases specific to the edge domain β the same way analyze_url
hands back form/CTA test cases for web.
3. Generate the test. generate_test(...)
writes a pytest skeleton wired to the edge fixtures (backend
, stream
, latency
). The body stays boring on purpose:
def test_throughput(stream, backend, latency):
"""Sustained-rate check: at least MIN_FPS over a 150-frame window."""
n, t0 = 0, time.time()
while n < 150:
ok, frame = stream.read()
if not ok:
break
latency.add(backend.infer(frame).latency_ms)
n += 1
fps = n / max(time.time() - t0, 1e-6)
assert fps >= MIN_FPS, f"throughput below target: {fps:.1f} < {MIN_FPS} fps"
def test_detect_target(stream, backend, latency):
"""Target label within IoU threshold; p95 latency holds."""
hit = False
for idx in range(MAX_FRAME):
ok, frame = stream.read()
if not ok:
break
res = backend.infer(frame)
latency.add(res.latency_ms)
for exp in annotations.get(str(idx), []):
if exp["label"] == LABEL and match_detection(res.detections, exp, IOU):
hit = True
assert hit, f"{LABEL!r} not detected within IoU={IOU}"
assert latency.p95() <= SLA, f"p95 {latency.p95():.1f}ms > {SLA}ms"
4. Run it. run_tests
β the runner starts the RTSP source, loads YOLOv8n once (session-scoped β re per test would wreck wall-clock), runs pytest, and snapshots the report.
YOLOv8n on CPU (no GPU, just the MacBook), 150 frames of the looped 720p stream:
| Metric | Result | Target (desktop yolov8n) |
|---|---|---|
| Throughput | 23.0 fps | |
| 25 β close, CPU-bound | ||
| p95 latency | 27.9 ms | |
| β€ 40 ms β | ||
| mean / p50 latency | 26.0 / 25.6 ms | β |
person detections |
||
| 150 / 150 frames | ||
| β | ||
| Empty-frame false positives | 0 | |
| 0 β |
The model nailed person
on every frame (it was pointed at me). It also confidently reported a suitcase
in all 150 frames β there wasn't one. That's the reality of a tiny model: great recall, noisy precision. Exactly the thing a real test surfaces and a "looks like it works" demo hides. p95 of 27.9ms beat the 40ms SLA comfortably; 23fps fell just short of 25 purely for lack of a GPU.
mk-qa-master ships SLA starting points so you don't guess:
| Scenario | min_fps | latency SLA | IoU |
|---|---|---|---|
| Desktop yolov8n dev | 25 | 40 ms | 0.5 |
| Jetson Nano | 15 | 70 ms | 0.5 |
| Jetson Orin Nano | 30 | 25 ms | 0.6 |
| Cloud GPU service | 60 | 16 ms | 0.6 |
test_empty_frame_no_false_positives
.Eating my own dog food turned up sharp edges I'm filing down:
connect()
to the RTSP port?" β but I had an Android emulator (qemu) squatting on the default port 8554, so the probe passed DESCRIBE 404
). v1.2 moves readiness to an actual RTSP DESCRIBE
success, and the docs now recommend a dedicated port.localhost
is two addresses.::
and the consumer resolves 127.0.0.1
, they miss each other. Pinning both sides to 127.0.0.1
removes the ambiguity β going into the defaults.generate_html_report
builds per-test cards from the runner's get_all_test_details()
; the edge runner doesn't override it yet, so an edge run shows summary tiles but no case list. Quick fix, already queued.QA_JETSON_HOST
/ QA_INFERENCE_ENDPOINT
will let the same test point at a real device or a GPU service and just re-tune the thresholds.
pip install "mk-qa-master[edge]" # opencv + ultralytics + torch
You don't need edge hardware to start testing edge models β a webcam, ffmpeg, mediamtx, and a 6MB YOLOv8n checkpoint get you a real, asserting, repeatable test in an afternoon. When the Jetson arrives, you point the same test at a different endpoint.
If you build CV/edge pipelines, I'd genuinely like to hear how you handle the latency-vs-correctness trade-off in CI β that tension is the whole game.
β Jack Kao