{"slug": "how-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes", "title": "How we stopped our AI assistant from hallucinating bug fixes", "summary": "LightShield, a SIEM built by LS-SIEM LLP, developed qa-probe, an open-source tool that stops AI coding assistants from hallucinating bug fixes by providing ground-truth evidence. The tool analyzes source code, probes live endpoints, and classifies root causes with calibrated confidence, enabling AI assistants to reason from evidence rather than guessing from status codes. Released under Apache-2.0, qa-probe supports frameworks like FastAPI, Express, and Next.js, and integrates via MCP with tools like Claude and Cursor.", "body_md": "*Cover: a real qa-probe run against our own stack, cropped to the summary - internal product detail withheld.*\n\nWe are building LightShield, a SIEM that is in active demo right now. We built\n\nmost of it pair-programming with an AI coding assistant wired in over MCP - it\n\nran our stack, read the errors, and patched its own code. For a small team that\n\nis a superpower. Until an endpoint failed.\n\nHere is the loop we kept hitting. A route returns a 500, or a 404, or an empty\n\n`[]`\n\n. The assistant looks at the status code and announces the cause with total\n\nconfidence. Then it rewrites a handler that was never broken - because a status\n\ncode is not a cause, and it had nothing else to go on. So it guessed, and it\n\nguessed wrong, and the diff made things worse.\n\nThe thing is, that empty `[]`\n\nhad at least six possible causes:\n\nSame symptom, six different fixes. We could bisect to the real one. The AI could\n\nnot - it had no ground truth, so it manufactured one.\n\nIt analyzes the app, probes the live endpoints, and classifies each failure with\n\na root cause and a fix hint. Three decoupled, cached phases:\n\n``` php\nqa-probe analyze   # parse source + OpenAPI -> route graph\nqa-probe probe     # hit live endpoints (HTTP/SSE/WS), record evidence\nqa-probe report    # classify root cause -> HTML / Markdown / JSON / AI-context\n# or just: qa-probe run\n```\n\nIt has adapters for FastAPI, Express, Next.js, tRPC, GraphQL, and a generic\n\nfallback, so it discovers your routes instead of you hand-listing them.\n\nEach result carries the evidence (the real request, a bounded response sample,\n\nthe timing), a root cause from ~25 categories, and a calibrated confidence -\n\n`high`\n\n, `medium`\n\n, or `none`\n\n. When it cannot tell, it returns `none`\n\ninstead of\n\nbluffing. No neural network, no black box - transparent rules plus per-endpoint\n\nstat memory, so you can always read *why* it landed on a verdict. An AI\n\nconsuming this needs to verify the claim, not trust a vibe.\n\n```\nqa-probe mcp   # exposes 8 tools to Claude, Cursor, any MCP client\n```\n\nThe assistant stopped reasoning from a status code and started reasoning from\n\nevidence: \"empty database, high confidence, here is the response that proves\n\nit.\" It seeded the DB instead of rewriting the handler. It fixed the right\n\nlayer. The guessing basically stopped.\n\nIt helped us debug faster. It helped the AI more - because an AI is only as good\n\nas the evidence you hand it, and \"the endpoint is failing\" is not evidence.\n\nIt is early and it is open. The fastest way to help:\n\nOne housekeeping note: contributions are sign-off based (DCO) - commit with\n\n`git commit -s`\n\nso the project's licensing stays clean. That is the only hoop.\n\nWe built it for ourselves. It worked well enough that we cleaned it up and\n\nreleased it under Apache-2.0.\n\n```\nnpm i -g qa-probe\n```\n\nBuilt by LS-SIEM LLP. If you run it against your own API, I would genuinely like\n\nto know what it found - that feedback is how the rules get sharper.", "url": "https://wpnews.pro/news/how-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes", "canonical_source": "https://dev.to/vibez06/how-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes-53l5", "published_at": "2026-06-25 06:43:06+00:00", "updated_at": "2026-06-25 07:13:50.066078+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "large-language-models", "ai-agents"], "entities": ["LS-SIEM LLP", "LightShield", "qa-probe", "MCP", "Claude", "Cursor", "FastAPI", "Express"], "alternates": {"html": "https://wpnews.pro/news/how-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes", "markdown": "https://wpnews.pro/news/how-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes.md", "text": "https://wpnews.pro/news/how-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes.txt", "jsonld": "https://wpnews.pro/news/how-we-stopped-our-ai-assistant-from-hallucinating-bug-fixes.jsonld"}}