I built a docs Q&A engine that returns null instead of hallucinating

wpnews.pro

Every "docs chatbot" today routes user questions through OpenAI. For

open-source maintainers, privacy-conscious teams, and air-gapped

environments, that's either too expensive or unacceptable. So I built

one that doesn't.

Knowledge Base API is a

small FastAPI service that answers questions over a folder of markdown

files using BM25 + POS-aware lemmatization + WordNet synonym expansion. No models. No API keys. No data leaving the box.

Live demo against FastAPI + Pydantic + Starlette docs

(2,869 sections, 265 files).

The single hardest behaviour to enforce was making the API return

null

instead of inventing an answer when nothing in the corpus is

a real fit.

curl -X POST https://kb-api-q30f.onrender.com/ask \
  -H "Content-Type: application/json" \
  -d '{"question":"what is quantum chromodynamics"}'
{
  "answer": null,
  "section": null,
  "source": null,
  "confidence": 0.0,
  "message": "I don't have enough information to answer that."
}

Most retrieval systems silently return the least-bad section. The

trade-off — sometimes refusing to answer — is the whole point.

The default NLTK tokenizer keeps response_model

,

OAuth2PasswordBearer

, and Cross-Origin

as single opaque tokens.

That means a query for "what is response_model" never matches because

the document body has response_model

underscored and the lemmatized

query doesn't.

Solution: split on _

, -

, and CamelCase boundaries before

lemmatization, and keep BOTH the full identifier and its pieces in the

indexed token stream.

split_identifier("OAuth2PasswordBearer")

split_identifier("Cross-Origin")

Going from 50% to 90% accuracy on identifier-heavy queries was almost

entirely this fix.

If you expand CORS

to cross origin resource sharing

at index time,

every BM25 IDF calculation breaks — terms appear artificially often,

document lengths inflate, scoring degrades.

The right move is query-side only:

_ACRONYMS = {
    "cors": "cross origin resource sharing",
    "jwt":  "json web token",
    "api":  "application programming interface",
    "csrf": "cross site request forgery",
    "xss":  "cross site scripting",
    "orm":  "object relational mapping",
}

When the query contains an acronym, append the expansion tokens to

the query. The index stays pure.

Pure BM25 over docs returns weird results because:

reference/foo.md

are canonical definitions; tutorials are examplesSo the score gets four passes:

raw_bm25_score(query)

× HEADING_BOOST_FACTOR if heading-query overlap ≥ 50%

1.0 if heading EXACTLY matches query subject

× FILENAME_BOOST_FACTOR if filename overlaps query

× REFERENCE_PATH_BOOST if path is under reference/

And below a hard threshold, the result is rejected entirely:

if not scores.size or scores.max() < CONFIDENCE_THRESHOLD:
    return _no_match()

That last line is the difference between "honestly returns null"

and "silently returns the least-bad section."

A few hours after launching on Reddit, a commenter asked: "what

about searching 'cross origin' for CORS, or what about typos like

'rsponse_model'?"

The first case worked fine — BM25 finds the CORS docs because the

body contains "Cross-Origin Resource Sharing" verbatim. But typos?

Total miss. "rsponse_model" returned a wrong answer at 0.34

confidence — confidently wrong, above the threshold, no warning to

the user.

That's the worst possible failure mode for a "honest null" product:

the no-fabrication promise breaks for typo'd in-corpus queries,

which is arguably the more common failure mode than out-of-corpus

queries.

Fix shipped same day: a BK-tree (Burkhard-Keller tree) over the

indexed vocabulary at index time, with query-time nearest-neighbour

lookup using length-tuned edit distance:

def fuzzy_candidates(tree, token):
    if len(token) <= 8:
        max_dist = 1   # short words: ambiguous beyond one edit
    else:
        max_dist = 2   # OAuth2PasswordBearer can tolerate more slop
    return [w for w, d in tree.search(token, max_dist) if d > 0]

When fuzzy correction fires, the confidence is capped at 0.6 and the

response includes a "verify the source" message so the caller knows

the answer came from a corrected query, not an exact match.

Plus a guard against fuzzy-correcting nonsense queries: if 3+ user

tokens are unrecognized, return null. "Quantum chromodynamics

neutrino flux" against FastAPI docs correctly stays null even though

fuzzy lookup could find nearest-neighbour matches for each individual

word.

Query	Result	Notes
`what is response_model`
`response_model Priority`
1.0 confidence
`how do I add CORS`
`CORS (Cross-Origin Resource Sharing)`
1.0 confidence
`what is OAuth2PasswordBearer`
`FastAPI's OAuth2PasswordBearer`
1.0 confidence
`what is APIRouter`
`APIRouter class` (in reference/apirouter.md)
1.0 confidence
`what is rsponse_model` (typo)
`response_model Priority`
0.6 confidence + warning
`how do I add corss` (typo)
`CORS preflight requests`
0.46 confidence + warning
`what is quantum chromodynamics`
`null`
honest refusal

answer

field is the matching section's body verbatim, not a paraphrase. If you want a summary, use a different tool.null

. That's the feature.| Layer | Choice | Why | |---|---|---| | Web | FastAPI + Uvicorn | Async, typed, batteries-included | | Ranking | rank-bm25 | Reference Okapi BM25 implementation | | NLP | NLTK | WordNet, Penn Treebank tagger, stopwords — boring and reliable | | Fuzzy | Custom BK-tree | ~150 lines, no dependency | | Parser | markdown-it-py | Handles fenced code blocks correctly | | File watch | watchdog | Cross-platform file events |

Total app code: ~700 lines. Image size: ~250 MB. RAM at runtime:

~40 MB. Indexes 1,800 markdown sections in well under a second.

github.com/teamerisingstars/KB-API

Live demo: kb-api-q30f.onrender.com

If you've built something similar or have thoughts on the BM25

tuning, the fuzzy correction, or the boost stack, I'd genuinely like

to hear what would change. Drop a comment or open an issue.

source & further reading

dev.to — original article Why AI-Built Apps Feel Fast in Testing and Break in Production FFmpeg MCP Server for Zed: Video Processing in Your AI Editor What Nobody Tells You About Running 9 Autonomous Agents on a Real Gym

I built a docs Q&A engine that returns null instead of hallucinating

Run your AI side-project on zahid.host