Testing Edge AI from an MCP tool: I pointed mk-qa-master at my webcam and YOLO answered

wpnews.pro

By Jack Kao — author of

[mk-qa-master], an MCP-native QA toolkit.

Most "AI testing" stops at calling an API and asserting the response isn't empty. Edge AI — a model running on a live camera feed — doesn't fit that mold. You can't assert exact bounding-box coordinates (the output is fuzzy by design), and "correct but 200ms too late" is a production failure, not a pass.

When I added the edge runner to mk-qa-master in v1.1, I wanted it to make that hard problem feel like writing any other test. So this post is me dogfooding it: my laptop webcam → an RTSP stream → a YOLOv8 model → assertions on IoU, throughput, and p95 latency — the whole thing orchestrated through MCP tool calls, no Jetson required.

mk-qa-master is an MCP server: your AI client (Claude Code, Cursor, etc.) calls its tools to analyze, generate, and run tests. It already ships runners for pytest-playwright

, jest

, cypress

, go

, maestro

, schemathesis

… and as of v1.1, edge

.

Flip one env var:

// the mk-qa-master MCP server config
"env": { "QA_RUNNER": "edge" }

…and the same generate_test

/ run_tests

tools now speak RTSP + YOLO instead of browsers. Same muscle memory, completely different domain.

A useful Edge AI test answers all of these at once:

Correctness is the subtle one. Two runs of the same model on the same frame can disagree by a few pixels, so detection correctness uses IoU (Intersection over Union) against an expected box with a threshold (default 0.5). You assert "the box overlaps enough," never "the box is identical." mk-qa-master ships this as edge.metrics.match_detection

.

The testing layer wants a stream, not a device handle — so you bridge the camera:

MacBook camera
   │  ffmpeg (avfoundation capture)
   ▼
local clip  (5s — deterministic, replayable, no privacy surprises in CI)
   │  ffmpeg -re -stream_loop -1   (loop forever)
   ▼
mediamtx  →  rtsp://127.0.0.1:8600/cam
   │  OpenCV VideoCapture
   ▼
YOLOv8n (LocalYolo)  →  IoU / fps / latency assertions  →  pytest report

Recording a clip once and looping it over RTSP is what makes this deterministic — the same frames every run, no "why did it fail at 6pm" lighting drama. The edge runner actually does this loop for you when QA_RTSP_SOURCE

points at a file; I'll show the manual version because it's clearer:

ffmpeg -f avfoundation -framerate 30 -video_size 1280x720 \
       -i "0" -t 5 -pix_fmt yuv420p edge_demo.mp4

ffmpeg -re -stream_loop -1 -i edge_demo.mp4 \
       -c:v libx264 -preset ultrafast -tune zerolatency \
       -f rtsp rtsp://127.0.0.1:8600/cam

This is the part I'm proud of — the whole session is just tools.

1. Confirm the runner. get_runner_info

→

{ "current": "edge", "available": ["cypress","edge","go","jest","maestro","newman","pytest-playwright","schemathesis"] }

2. Probe the stream. analyze_stream(rtsp_url="rtsp://127.0.0.1:8600/cam")

→

{
  "url": "rtsp://127.0.0.1:8600/cam",
  "width": 1280, "height": 720, "fps": 30.0,
  "labels": [],
  "candidate_tcs": [
    "overall throughput should be >= the configured min_fps",
    "single-frame p95 latency should be <= the latency SLA",
    "stream reconnects after mid-test interruption without crashing",
    "empty / no-target frames do not generate false-positive detections"
  ]
}

Note it hands back candidate test cases specific to the edge domain — the same way analyze_url

hands back form/CTA test cases for web.

3. Generate the test. generate_test(...)

writes a pytest skeleton wired to the edge fixtures (backend

, stream

, latency

). The body stays boring on purpose:

def test_throughput(stream, backend, latency):
    """Sustained-rate check: at least MIN_FPS over a 150-frame window."""
    n, t0 = 0, time.time()
    while n < 150:
        ok, frame = stream.read()
        if not ok:
            break
        latency.add(backend.infer(frame).latency_ms)
        n += 1
    fps = n / max(time.time() - t0, 1e-6)
    assert fps >= MIN_FPS, f"throughput below target: {fps:.1f} < {MIN_FPS} fps"

def test_detect_target(stream, backend, latency):
    """Target label within IoU threshold; p95 latency holds."""
    hit = False
    for idx in range(MAX_FRAME):
        ok, frame = stream.read()
        if not ok:
            break
        res = backend.infer(frame)
        latency.add(res.latency_ms)
        for exp in annotations.get(str(idx), []):
            if exp["label"] == LABEL and match_detection(res.detections, exp, IOU):
                hit = True
    assert hit, f"{LABEL!r} not detected within IoU={IOU}"
    assert latency.p95() <= SLA, f"p95 {latency.p95():.1f}ms > {SLA}ms"

4. Run it. run_tests

→ the runner starts the RTSP source, loads YOLOv8n once (session-scoped — re per test would wreck wall-clock), runs pytest, and snapshots the report.

YOLOv8n on CPU (no GPU, just the MacBook), 150 frames of the looped 720p stream:

Metric	Result	Target (desktop yolov8n)
Throughput	23.0 fps
25 — close, CPU-bound
p95 latency	27.9 ms
≤ 40 ms ✅
mean / p50 latency	26.0 / 25.6 ms	—
`person` detections
150 / 150 frames
—
Empty-frame false positives	0
0 ✅

The model nailed person

on every frame (it was pointed at me). It also confidently reported a suitcase

in all 150 frames — there wasn't one. That's the reality of a tiny model: great recall, noisy precision. Exactly the thing a real test surfaces and a "looks like it works" demo hides. p95 of 27.9ms beat the 40ms SLA comfortably; 23fps fell just short of 25 purely for lack of a GPU.

mk-qa-master ships SLA starting points so you don't guess:

Scenario	min_fps	latency SLA	IoU
Desktop yolov8n dev	25	40 ms	0.5
Jetson Nano	15	70 ms	0.5
Jetson Orin Nano	30	25 ms	0.6
Cloud GPU service	60	16 ms	0.6

test_empty_frame_no_false_positives

.Eating my own dog food turned up sharp edges I'm filing down:

connect()

to the RTSP port?" — but I had an Android emulator (qemu) squatting on the default port 8554, so the probe passed DESCRIBE 404

). v1.2 moves readiness to an actual RTSP DESCRIBE

success, and the docs now recommend a dedicated port.localhost

is two addresses.::

and the consumer resolves 127.0.0.1

, they miss each other. Pinning both sides to 127.0.0.1

removes the ambiguity — going into the defaults.generate_html_report

builds per-test cards from the runner's get_all_test_details()

; the edge runner doesn't override it yet, so an edge run shows summary tiles but no case list. Quick fix, already queued.QA_JETSON_HOST

/ QA_INFERENCE_ENDPOINT

will let the same test point at a real device or a GPU service and just re-tune the thresholds.

pip install "mk-qa-master[edge]"   # opencv + ultralytics + torch

You don't need edge hardware to start testing edge models — a webcam, ffmpeg, mediamtx, and a 6MB YOLOv8n checkpoint get you a real, asserting, repeatable test in an afternoon. When the Jetson arrives, you point the same test at a different endpoint.

If you build CV/edge pipelines, I'd genuinely like to hear how you handle the latency-vs-correctness trade-off in CI — that tension is the whole game.

— Jack Kao

source & further reading

dev.to — original article I built vexyo, conformance and regression testing for MCP servers Weekend #2: Scafolding the 3-Way LLM Orchestration You probably don't need a vector database for agent memory

Testing Edge AI from an MCP tool: I pointed mk-qa-master at my webcam and YOLO answered

Run your AI side-project on zahid.host