Every "docs chatbot" today routes user questions through OpenAI. For
open-source maintainers, privacy-conscious teams, and air-gapped
environments, that's either too expensive or unacceptable. So I built
one that doesn't.
Knowledge Base API is a
small FastAPI service that answers questions over a folder of markdown
files using BM25 + POS-aware lemmatization + WordNet synonym expansion. No models. No API keys. No data leaving the box.
Live demo against FastAPI + Pydantic + Starlette docs
(2,869 sections, 265 files).
The single hardest behaviour to enforce was making the API return
null
instead of inventing an answer when nothing in the corpus is
a real fit.
curl -X POST https://kb-api-q30f.onrender.com/ask \
-H "Content-Type: application/json" \
-d '{"question":"what is quantum chromodynamics"}'
{
"answer": null,
"section": null,
"source": null,
"confidence": 0.0,
"message": "I don't have enough information to answer that."
}
Most retrieval systems silently return the least-bad section. The
trade-off — sometimes refusing to answer — is the whole point.
The default NLTK tokenizer keeps response_model
,
OAuth2PasswordBearer
, and Cross-Origin
as single opaque tokens.
That means a query for "what is response_model" never matches because
the document body has response_model
underscored and the lemmatized
query doesn't.
Solution: split on _
, -
, and CamelCase boundaries before
lemmatization, and keep BOTH the full identifier and its pieces in the
indexed token stream.
split_identifier("OAuth2PasswordBearer")
split_identifier("Cross-Origin")
Going from 50% to 90% accuracy on identifier-heavy queries was almost
entirely this fix.
If you expand CORS
to cross origin resource sharing
at index time,
every BM25 IDF calculation breaks — terms appear artificially often,
document lengths inflate, scoring degrades.
The right move is query-side only:
_ACRONYMS = {
"cors": "cross origin resource sharing",
"jwt": "json web token",
"api": "application programming interface",
"csrf": "cross site request forgery",
"xss": "cross site scripting",
"orm": "object relational mapping",
}
When the query contains an acronym, append the expansion tokens to
the query. The index stays pure.
Pure BM25 over docs returns weird results because:
reference/foo.md
are canonical definitions; tutorials are examplesSo the score gets four passes:
raw_bm25_score(query)
× HEADING_BOOST_FACTOR if heading-query overlap ≥ 50%
1.0 if heading EXACTLY matches query subject
× FILENAME_BOOST_FACTOR if filename overlaps query
× REFERENCE_PATH_BOOST if path is under reference/
And below a hard threshold, the result is rejected entirely:
if not scores.size or scores.max() < CONFIDENCE_THRESHOLD:
return _no_match()
That last line is the difference between "honestly returns null"
and "silently returns the least-bad section."
A few hours after launching on Reddit, a commenter asked: "what
about searching 'cross origin' for CORS, or what about typos like
'rsponse_model'?"
The first case worked fine — BM25 finds the CORS docs because the
body contains "Cross-Origin Resource Sharing" verbatim. But typos?
Total miss. "rsponse_model" returned a wrong answer at 0.34
confidence — confidently wrong, above the threshold, no warning to
the user.
That's the worst possible failure mode for a "honest null" product:
the no-fabrication promise breaks for typo'd in-corpus queries,
which is arguably the more common failure mode than out-of-corpus
queries.
Fix shipped same day: a BK-tree (Burkhard-Keller tree) over the
indexed vocabulary at index time, with query-time nearest-neighbour
lookup using length-tuned edit distance:
def fuzzy_candidates(tree, token):
if len(token) <= 8:
max_dist = 1 # short words: ambiguous beyond one edit
else:
max_dist = 2 # OAuth2PasswordBearer can tolerate more slop
return [w for w, d in tree.search(token, max_dist) if d > 0]
When fuzzy correction fires, the confidence is capped at 0.6 and the
response includes a "verify the source" message so the caller knows
the answer came from a corrected query, not an exact match.
Plus a guard against fuzzy-correcting nonsense queries: if 3+ user
tokens are unrecognized, return null. "Quantum chromodynamics
neutrino flux" against FastAPI docs correctly stays null even though
fuzzy lookup could find nearest-neighbour matches for each individual
word.
| Query | Result | Notes |
|---|---|---|
what is response_model |
||
response_model Priority |
||
| 1.0 confidence | ||
how do I add CORS |
||
CORS (Cross-Origin Resource Sharing) |
||
| 1.0 confidence | ||
what is OAuth2PasswordBearer |
||
FastAPI's OAuth2PasswordBearer |
||
| 1.0 confidence | ||
what is APIRouter |
||
APIRouter class (in reference/apirouter.md) |
||
| 1.0 confidence | ||
what is rsponse_model (typo) |
||
response_model Priority |
||
| 0.6 confidence + warning | ||
how do I add corss (typo) |
||
CORS preflight requests |
||
| 0.46 confidence + warning | ||
what is quantum chromodynamics |
||
null |
||
| honest refusal |
answer
field is the matching section's body
verbatim, not a paraphrase. If you want a summary, use a different
tool.null
. That's the feature.| Layer | Choice | Why | |---|---|---| | Web | FastAPI + Uvicorn | Async, typed, batteries-included | | Ranking | rank-bm25 | Reference Okapi BM25 implementation | | NLP | NLTK | WordNet, Penn Treebank tagger, stopwords — boring and reliable | | Fuzzy | Custom BK-tree | ~150 lines, no dependency | | Parser | markdown-it-py | Handles fenced code blocks correctly | | File watch | watchdog | Cross-platform file events |
Total app code: ~700 lines. Image size: ~250 MB. RAM at runtime:
~40 MB. Indexes 1,800 markdown sections in well under a second.
github.com/teamerisingstars/KB-API
Live demo: kb-api-q30f.onrender.com
If you've built something similar or have thoughts on the BM25
tuning, the fuzzy correction, or the boost stack, I'd genuinely like
to hear what would change. Drop a comment or open an issue.