Cover: a real qa-probe run against our own stack, cropped to the summary - internal product detail withheld.
We are building LightShield, a SIEM that is in active demo right now. We built
most of it pair-programming with an AI coding assistant wired in over MCP - it
ran our stack, read the errors, and patched its own code. For a small team that
is a superpower. Until an endpoint failed.
Here is the loop we kept hitting. A route returns a 500, or a 404, or an empty
[]
. The assistant looks at the status code and announces the cause with total
confidence. Then it rewrites a handler that was never broken - because a status
code is not a cause, and it had nothing else to go on. So it guessed, and it
guessed wrong, and the diff made things worse.
The thing is, that empty []
had at least six possible causes:
Same symptom, six different fixes. We could bisect to the real one. The AI could
not - it had no ground truth, so it manufactured one.
It analyzes the app, probes the live endpoints, and classifies each failure with
a root cause and a fix hint. Three decoupled, cached phases:
qa-probe analyze # parse source + OpenAPI -> route graph
qa-probe probe # hit live endpoints (HTTP/SSE/WS), record evidence
qa-probe report # classify root cause -> HTML / Markdown / JSON / AI-context
It has adapters for FastAPI, Express, Next.js, tRPC, GraphQL, and a generic
fallback, so it discovers your routes instead of you hand-listing them.
Each result carries the evidence (the real request, a bounded response sample,
the timing), a root cause from ~25 categories, and a calibrated confidence -
high
, medium
, or none
. When it cannot tell, it returns none
instead of
bluffing. No neural network, no black box - transparent rules plus per-endpoint
stat memory, so you can always read why it landed on a verdict. An AI
consuming this needs to verify the claim, not trust a vibe.
qa-probe mcp # exposes 8 tools to Claude, Cursor, any MCP client
The assistant stopped reasoning from a status code and started reasoning from
evidence: "empty database, high confidence, here is the response that proves
it." It seeded the DB instead of rewriting the handler. It fixed the right
layer. The guessing basically stopped.
It helped us debug faster. It helped the AI more - because an AI is only as good
as the evidence you hand it, and "the endpoint is failing" is not evidence.
It is early and it is open. The fastest way to help:
One housekeeping note: contributions are sign-off based (DCO) - commit with
git commit -s
so the project's licensing stays clean. That is the only hoop.
We built it for ourselves. It worked well enough that we cleaned it up and
released it under Apache-2.0.
npm i -g qa-probe
Built by LS-SIEM LLP. If you run it against your own API, I would genuinely like
to know what it found - that feedback is how the rules get sharper.