# How we stopped our AI assistant from hallucinating bug fixes

> Source: <https://dev.to/vibez06/how-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes-53l5>
> Published: 2026-06-25 06:43:06+00:00

*Cover: a real qa-probe run against our own stack, cropped to the summary - internal product detail withheld.*

We are building LightShield, a SIEM that is in active demo right now. We built

most of it pair-programming with an AI coding assistant wired in over MCP - it

ran our stack, read the errors, and patched its own code. For a small team that

is a superpower. Until an endpoint failed.

Here is the loop we kept hitting. A route returns a 500, or a 404, or an empty

`[]`

. The assistant looks at the status code and announces the cause with total

confidence. Then it rewrites a handler that was never broken - because a status

code is not a cause, and it had nothing else to go on. So it guessed, and it

guessed wrong, and the diff made things worse.

The thing is, that empty `[]`

had at least six possible causes:

Same symptom, six different fixes. We could bisect to the real one. The AI could

not - it had no ground truth, so it manufactured one.

It analyzes the app, probes the live endpoints, and classifies each failure with

a root cause and a fix hint. Three decoupled, cached phases:

``` php
qa-probe analyze   # parse source + OpenAPI -> route graph
qa-probe probe     # hit live endpoints (HTTP/SSE/WS), record evidence
qa-probe report    # classify root cause -> HTML / Markdown / JSON / AI-context
# or just: qa-probe run
```

It has adapters for FastAPI, Express, Next.js, tRPC, GraphQL, and a generic

fallback, so it discovers your routes instead of you hand-listing them.

Each result carries the evidence (the real request, a bounded response sample,

the timing), a root cause from ~25 categories, and a calibrated confidence -

`high`

, `medium`

, or `none`

. When it cannot tell, it returns `none`

instead of

bluffing. No neural network, no black box - transparent rules plus per-endpoint

stat memory, so you can always read *why* it landed on a verdict. An AI

consuming this needs to verify the claim, not trust a vibe.

```
qa-probe mcp   # exposes 8 tools to Claude, Cursor, any MCP client
```

The assistant stopped reasoning from a status code and started reasoning from

evidence: "empty database, high confidence, here is the response that proves

it." It seeded the DB instead of rewriting the handler. It fixed the right

layer. The guessing basically stopped.

It helped us debug faster. It helped the AI more - because an AI is only as good

as the evidence you hand it, and "the endpoint is failing" is not evidence.

It is early and it is open. The fastest way to help:

One housekeeping note: contributions are sign-off based (DCO) - commit with

`git commit -s`

so the project's licensing stays clean. That is the only hoop.

We built it for ourselves. It worked well enough that we cleaned it up and

released it under Apache-2.0.

```
npm i -g qa-probe
```

Built by LS-SIEM LLP. If you run it against your own API, I would genuinely like

to know what it found - that feedback is how the rules get sharper.