How we stopped our AI assistant from hallucinating bug fixes

wpnews.pro

cd /news/developer-tools/how-we-stopped-our-ai-assistant-from… · home › topics › developer-tools › article

[ARTICLE · art-38945] src=dev.to ↗ pub=2026-06-25T06:43Z topic=developer-tools verified=true sentiment=↑ positive

How we stopped our AI assistant from hallucinating bug fixes

LightShield, a SIEM built by LS-SIEM LLP, developed qa-probe, an open-source tool that stops AI coding assistants from hallucinating bug fixes by providing ground-truth evidence. The tool analyzes source code, probes live endpoints, and classifies root causes with calibrated confidence, enabling AI assistants to reason from evidence rather than guessing from status codes. Released under Apache-2.0, qa-probe supports frameworks like FastAPI, Express, and Next.js, and integrates via MCP with tools like Claude and Cursor.

read3 min views1 publishedJun 25, 2026

Cover: a real qa-probe run against our own stack, cropped to the summary - internal product detail withheld.

We are building LightShield, a SIEM that is in active demo right now. We built

most of it pair-programming with an AI coding assistant wired in over MCP - it

ran our stack, read the errors, and patched its own code. For a small team that

is a superpower. Until an endpoint failed.

Here is the loop we kept hitting. A route returns a 500, or a 404, or an empty

[]

. The assistant looks at the status code and announces the cause with total

confidence. Then it rewrites a handler that was never broken - because a status

code is not a cause, and it had nothing else to go on. So it guessed, and it

guessed wrong, and the diff made things worse.

The thing is, that empty []

had at least six possible causes:

Same symptom, six different fixes. We could bisect to the real one. The AI could

not - it had no ground truth, so it manufactured one.

It analyzes the app, probes the live endpoints, and classifies each failure with

a root cause and a fix hint. Three decoupled, cached phases:

qa-probe analyze   # parse source + OpenAPI -> route graph
qa-probe probe     # hit live endpoints (HTTP/SSE/WS), record evidence
qa-probe report    # classify root cause -> HTML / Markdown / JSON / AI-context

It has adapters for FastAPI, Express, Next.js, tRPC, GraphQL, and a generic

fallback, so it discovers your routes instead of you hand-listing them.

Each result carries the evidence (the real request, a bounded response sample,

the timing), a root cause from ~25 categories, and a calibrated confidence -

high

, medium

, or none

. When it cannot tell, it returns none

instead of

bluffing. No neural network, no black box - transparent rules plus per-endpoint

stat memory, so you can always read why it landed on a verdict. An AI

consuming this needs to verify the claim, not trust a vibe.

qa-probe mcp   # exposes 8 tools to Claude, Cursor, any MCP client

The assistant stopped reasoning from a status code and started reasoning from

evidence: "empty database, high confidence, here is the response that proves

it." It seeded the DB instead of rewriting the handler. It fixed the right

layer. The guessing basically stopped.

It helped us debug faster. It helped the AI more - because an AI is only as good

as the evidence you hand it, and "the endpoint is failing" is not evidence.

It is early and it is open. The fastest way to help:

One housekeeping note: contributions are sign-off based (DCO) - commit with

git commit -s

so the project's licensing stays clean. That is the only hoop.

We built it for ourselves. It worked well enough that we cleaned it up and

released it under Apache-2.0.

npm i -g qa-probe

Built by LS-SIEM LLP. If you run it against your own API, I would genuinely like

to know what it found - that feedback is how the rules get sharper.

source & further reading

dev.to — original article Jarvis AI Platform: Implementing Semantic Memory Retrieval with pgvector MCP Logging: What I Wish I Knew Before Deploying My Production MCP Server (3 Weeks of Production Pain) Pydantic passed. Types matched. The downstream system still got garbage.

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-we-stopped-our-ai-as…

Read original on dev.to → dev.to/vibez06/how-we-stopped-our-ai-assistant-f…

mentioned entities

LS-SIEM LLP

LightShield

qa-probe

MCP

Claude

Cursor

FastAPI

Express

metadata

slughow-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes

topic#developer-tools

secondary3 topics

sentimentpositive

canonicaldev.to

navigation

← prevIs AI Coming for Our Jobs?

next →LG CNS launches agentic AI ERP t…

── more in #developer-tools 4 stories · sorted by recency

dev.to · 25 Jun · #developer-tools

Pydantic passed. Types matched. The downstream system still got garbage.

dev.to · 25 Jun · #developer-tools

MCP + RAG: Why I Stopped Building Complex RAG Systems After MCP Changed Everything

byteiota.com · 25 Jun · #developer-tools

Claude Code Dynamic Workflows: The Complete Guide

github.com · 25 Jun · #developer-tools

AI-website-cloner-template: Clone any website using AI coding agents

── more on @ls-siem llp 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 24 Jun · #ai-policy

An AI startup is suing the US government for taking away Anthropic's new model

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required