I built a docs Q&A engine that returns null instead of hallucinating

An engineer built a documentation Q&A engine that returns `null` instead of hallucinating answers when a query has no match in the corpus. The Knowledge Base API uses BM25 retrieval with POS-aware lemmatization and WordNet synonym expansion, requiring no language models, API keys, or external data transfers. The system also handles identifier-heavy queries by splitting on underscores, hyphens, and CamelCase boundaries, and includes a BK-tree for typo-tolerant matching.

Every "docs chatbot" today routes user questions through OpenAI. For open-source maintainers, privacy-conscious teams, and air-gapped environments, that's either too expensive or unacceptable. So I built one that doesn't. Knowledge Base API https://github.com/teamerisingstars/KB-API is a small FastAPI service that answers questions over a folder of markdown files using BM25 + POS-aware lemmatization + WordNet synonym expansion . No models. No API keys. No data leaving the box. Live demo against FastAPI + Pydantic + Starlette docs https://kb-api-q30f.onrender.com 2,869 sections, 265 files . The single hardest behaviour to enforce was making the API return null instead of inventing an answer when nothing in the corpus is a real fit. curl -X POST https://kb-api-q30f.onrender.com/ask \ -H "Content-Type: application/json" \ -d '{"question":"what is quantum chromodynamics"}' { "answer": null, "section": null, "source": null, "confidence": 0.0, "message": "I don't have enough information to answer that." } Most retrieval systems silently return the least-bad section. The trade-off — sometimes refusing to answer — is the whole point. The default NLTK tokenizer keeps response model , OAuth2PasswordBearer , and Cross-Origin as single opaque tokens. That means a query for "what is response model" never matches because the document body has response model underscored and the lemmatized query doesn't. Solution: split on , - , and CamelCase boundaries before lemmatization, and keep BOTH the full identifier and its pieces in the indexed token stream. php split identifier "OAuth2PasswordBearer" - "OAuth2PasswordBearer", "OAuth2", "Password", "Bearer" split identifier "Cross-Origin" - "Cross-Origin", "Cross", "Origin" Going from 50% to 90% accuracy on identifier-heavy queries was almost entirely this fix. If you expand CORS to cross origin resource sharing at index time, every BM25 IDF calculation breaks — terms appear artificially often, document lengths inflate, scoring degrades. The right move is query-side only : ACRONYMS = { "cors": "cross origin resource sharing", "jwt": "json web token", "api": "application programming interface", "csrf": "cross site request forgery", "xss": "cross site scripting", "orm": "object relational mapping", ... } When the query contains an acronym, append the expansion tokens to the query. The index stays pure. Pure BM25 over docs returns weird results because: reference/foo.md are canonical definitions; tutorials are examplesSo the score gets four passes: raw bm25 score query × HEADING BOOST FACTOR if heading-query overlap ≥ 50% 1.0 if heading EXACTLY matches query subject × FILENAME BOOST FACTOR if filename overlaps query × REFERENCE PATH BOOST if path is under reference/ And below a hard threshold, the result is rejected entirely: if not scores.size or scores.max < CONFIDENCE THRESHOLD: return no match That last line is the difference between "honestly returns null" and "silently returns the least-bad section." A few hours after launching on Reddit, a commenter asked: "what about searching 'cross origin' for CORS, or what about typos like 'rsponse model'?" The first case worked fine — BM25 finds the CORS docs because the body contains "Cross-Origin Resource Sharing" verbatim. But typos? Total miss. "rsponse model" returned a wrong answer at 0.34 confidence — confidently wrong, above the threshold, no warning to the user. That's the worst possible failure mode for a "honest null" product: the no-fabrication promise breaks for typo'd in-corpus queries, which is arguably the more common failure mode than out-of-corpus queries. Fix shipped same day: a BK-tree Burkhard-Keller tree over the indexed vocabulary at index time, with query-time nearest-neighbour lookup using length-tuned edit distance: python def fuzzy candidates tree, token : if len token <= 8: max dist = 1 short words: ambiguous beyond one edit else: max dist = 2 OAuth2PasswordBearer can tolerate more slop return w for w, d in tree.search token, max dist if d 0 When fuzzy correction fires, the confidence is capped at 0.6 and the response includes a "verify the source" message so the caller knows the answer came from a corrected query, not an exact match. Plus a guard against fuzzy-correcting nonsense queries: if 3+ user tokens are unrecognized, return null. "Quantum chromodynamics neutrino flux" against FastAPI docs correctly stays null even though fuzzy lookup could find nearest-neighbour matches for each individual word. | Query | Result | Notes | |---|---|---| what is response model | response model Priority | 1.0 confidence | how do I add CORS | CORS Cross-Origin Resource Sharing | 1.0 confidence | what is OAuth2PasswordBearer | FastAPI's OAuth2PasswordBearer | 1.0 confidence | what is APIRouter | APIRouter class in reference/apirouter.md | 1.0 confidence | what is rsponse model typo | response model Priority | 0.6 confidence + warning | how do I add corss typo | CORS preflight requests | 0.46 confidence + warning | what is quantum chromodynamics | null | honest refusal | answer field is the matching section's body verbatim, not a paraphrase. If you want a summary, use a different tool. null . That's the feature.| Layer | Choice | Why | |---|---|---| | Web | FastAPI + Uvicorn | Async, typed, batteries-included | | Ranking | rank-bm25 | Reference Okapi BM25 implementation | | NLP | NLTK | WordNet, Penn Treebank tagger, stopwords — boring and reliable | | Fuzzy | Custom BK-tree | ~150 lines, no dependency | | Parser | markdown-it-py | Handles fenced code blocks correctly | | File watch | watchdog | Cross-platform file events | Total app code: ~700 lines. Image size: ~250 MB. RAM at runtime: ~40 MB. Indexes 1,800 markdown sections in well under a second. github.com/teamerisingstars/KB-API https://github.com/teamerisingstars/KB-API Live demo: kb-api-q30f.onrender.com https://kb-api-q30f.onrender.com If you've built something similar or have thoughts on the BM25 tuning, the fuzzy correction, or the boost stack, I'd genuinely like to hear what would change. Drop a comment or open an issue.