# Testing Edge AI from an MCP tool: I pointed mk-qa-master at my webcam and YOLO answered

> Source: <https://dev.to/kao273183/testing-edge-ai-from-an-mcp-tool-i-pointed-mk-qa-master-at-my-webcam-and-yolo-answered-4gj3>
> Published: 2026-06-04 01:02:13+00:00

By Jack Kao — author of

[mk-qa-master], an MCP-native QA toolkit.

Most "AI testing" stops at calling an API and asserting the response isn't empty. Edge AI — a model running on a live camera feed — doesn't fit that mold. You can't assert exact bounding-box coordinates (the output is fuzzy by design), and "correct but 200ms too late" is a production failure, not a pass.

When I added the **edge runner** to mk-qa-master in v1.1, I wanted it to make that hard problem feel like writing any other test. So this post is me dogfooding it: my laptop webcam → an RTSP stream → a YOLOv8 model → assertions on IoU, throughput, and p95 latency — the whole thing orchestrated through MCP tool calls, no Jetson required.

mk-qa-master is an MCP server: your AI client (Claude Code, Cursor, etc.) calls its tools to analyze, generate, and run tests. It already ships runners for `pytest-playwright`

, `jest`

, `cypress`

, `go`

, `maestro`

, `schemathesis`

… and as of v1.1, `edge`

.

Flip one env var:

```
// the mk-qa-master MCP server config
"env": { "QA_RUNNER": "edge" }
```

…and the same `generate_test`

/ `run_tests`

tools now speak **RTSP + YOLO** instead of browsers. Same muscle memory, completely different domain.

A useful Edge AI test answers all of these at once:

Correctness is the subtle one. Two runs of the same model on the same frame can disagree by a few pixels, so detection correctness uses **IoU (Intersection over Union)** against an expected box with a threshold (default 0.5). You assert "the box overlaps enough," never "the box is identical." mk-qa-master ships this as `edge.metrics.match_detection`

.

The testing layer wants a *stream*, not a device handle — so you bridge the camera:

```
MacBook camera
   │  ffmpeg (avfoundation capture)
   ▼
local clip  (5s — deterministic, replayable, no privacy surprises in CI)
   │  ffmpeg -re -stream_loop -1   (loop forever)
   ▼
mediamtx  →  rtsp://127.0.0.1:8600/cam
   │  OpenCV VideoCapture
   ▼
YOLOv8n (LocalYolo)  →  IoU / fps / latency assertions  →  pytest report
```

Recording a clip once and looping it over RTSP is what makes this **deterministic** — the same frames every run, no "why did it fail at 6pm" lighting drama. The edge runner actually does this loop for you when `QA_RTSP_SOURCE`

points at a file; I'll show the manual version because it's clearer:

```
# capture once
ffmpeg -f avfoundation -framerate 30 -video_size 1280x720 \
       -i "0" -t 5 -pix_fmt yuv420p edge_demo.mp4

# serve on loop via mediamtx (mediamtx.yml: paths: { all_others: })
ffmpeg -re -stream_loop -1 -i edge_demo.mp4 \
       -c:v libx264 -preset ultrafast -tune zerolatency \
       -f rtsp rtsp://127.0.0.1:8600/cam
```

This is the part I'm proud of — the whole session is just tools.

**1. Confirm the runner.** `get_runner_info`

→

```
{ "current": "edge", "available": ["cypress","edge","go","jest","maestro","newman","pytest-playwright","schemathesis"] }
```

**2. Probe the stream.** `analyze_stream(rtsp_url="rtsp://127.0.0.1:8600/cam")`

→

```
{
  "url": "rtsp://127.0.0.1:8600/cam",
  "width": 1280, "height": 720, "fps": 30.0,
  "labels": [],
  "candidate_tcs": [
    "overall throughput should be >= the configured min_fps",
    "single-frame p95 latency should be <= the latency SLA",
    "stream reconnects after mid-test interruption without crashing",
    "empty / no-target frames do not generate false-positive detections"
  ]
}
```

Note it hands back **candidate test cases** specific to the edge domain — the same way `analyze_url`

hands back form/CTA test cases for web.

**3. Generate the test.** `generate_test(...)`

writes a pytest skeleton wired to the edge fixtures (`backend`

, `stream`

, `latency`

). The body stays boring on purpose:

``` python
def test_throughput(stream, backend, latency):
    """Sustained-rate check: at least MIN_FPS over a 150-frame window."""
    n, t0 = 0, time.time()
    while n < 150:
        ok, frame = stream.read()
        if not ok:
            break
        latency.add(backend.infer(frame).latency_ms)
        n += 1
    fps = n / max(time.time() - t0, 1e-6)
    assert fps >= MIN_FPS, f"throughput below target: {fps:.1f} < {MIN_FPS} fps"

def test_detect_target(stream, backend, latency):
    """Target label within IoU threshold; p95 latency holds."""
    hit = False
    for idx in range(MAX_FRAME):
        ok, frame = stream.read()
        if not ok:
            break
        res = backend.infer(frame)
        latency.add(res.latency_ms)
        for exp in annotations.get(str(idx), []):
            if exp["label"] == LABEL and match_detection(res.detections, exp, IOU):
                hit = True
    assert hit, f"{LABEL!r} not detected within IoU={IOU}"
    assert latency.p95() <= SLA, f"p95 {latency.p95():.1f}ms > {SLA}ms"
```

**4. Run it.** `run_tests`

→ the runner starts the RTSP source, loads YOLOv8n once (session-scoped — reloading per test would wreck wall-clock), runs pytest, and snapshots the report.

YOLOv8n on **CPU** (no GPU, just the MacBook), 150 frames of the looped 720p stream:

| Metric | Result | Target (desktop yolov8n) |
|---|---|---|
| Throughput | 23.0 fps |
25 — close, CPU-bound |
| p95 latency | 27.9 ms |
≤ 40 ms ✅ |
| mean / p50 latency | 26.0 / 25.6 ms | — |
`person` detections |
150 / 150 frames |
— |
| Empty-frame false positives | 0 |
0 ✅ |

The model nailed `person`

on every frame (it was pointed at me). It *also* confidently reported a `suitcase`

in all 150 frames — there wasn't one. That's the reality of a tiny model: great recall, noisy precision. Exactly the thing a real test surfaces and a "looks like it works" demo hides. p95 of 27.9ms beat the 40ms SLA comfortably; 23fps fell just short of 25 purely for lack of a GPU.

mk-qa-master ships SLA starting points so you don't guess:

| Scenario | min_fps | latency SLA | IoU |
|---|---|---|---|
| Desktop yolov8n dev | 25 | 40 ms | 0.5 |
| Jetson Nano | 15 | 70 ms | 0.5 |
| Jetson Orin Nano | 30 | 25 ms | 0.6 |
| Cloud GPU service | 60 | 16 ms | 0.6 |

`test_empty_frame_no_false_positives`

.Eating my own dog food turned up sharp edges I'm filing down:

`connect()`

to the RTSP port?" — but I had an Android emulator (qemu) squatting on the default port 8554, so the probe passed `DESCRIBE 404`

). v1.2 moves readiness to an actual RTSP `DESCRIBE`

success, and the docs now recommend a dedicated port.`localhost`

is two addresses.`::`

and the consumer resolves `127.0.0.1`

, they miss each other. Pinning both sides to `127.0.0.1`

removes the ambiguity — going into the defaults.`generate_html_report`

builds per-test cards from the runner's `get_all_test_details()`

; the edge runner doesn't override it yet, so an edge run shows summary tiles but no case list. Quick fix, already queued.`QA_JETSON_HOST`

/ `QA_INFERENCE_ENDPOINT`

will let the same test point at a real device or a GPU service and just re-tune the thresholds.

```
pip install "mk-qa-master[edge]"   # opencv + ultralytics + torch
# add the MCP server to your client, set QA_RUNNER=edge,
# point QA_RTSP_SOURCE at a clip or an rtsp:// url, and call run_tests
```

You don't need edge hardware to start testing edge models — a webcam, ffmpeg, mediamtx, and a 6MB YOLOv8n checkpoint get you a real, asserting, repeatable test in an afternoon. When the Jetson arrives, you point the same test at a different endpoint.

If you build CV/edge pipelines, I'd genuinely like to hear how you handle the latency-vs-correctness trade-off in CI — that tension is the whole game.

*— Jack Kao*
