# Why I Stopped Building My Own Document Q&A from Scratch

> Source: <https://dev.to/__c1b9e06dc90a7e0a676b/why-i-stopped-building-my-own-document-qa-from-scratch-19g5>
> Published: 2026-06-13 08:01:09+00:00

Two months ago, I was knee-deep in a project that sounded simple: build a system that could answer questions from our company’s internal documentation. We had hundreds of PDFs, Confluence pages, and READMEs. The goal was to let junior developers ask natural language questions and get accurate answers instantly.

I thought, “How hard can it be? I’ll just fine-tune a small LLM on our documents.”

Spoiler: it was that hard, and then some.

I spent two weeks collecting, cleaning, and chunking our documentation. I wrote a Hugging Face training script, rented a GPU, and fine-tuned a 7B parameter model. The result? A model that could recite our API docs verbatim but couldn’t answer a question like “Why does our auth flow fail for expired tokens?” without hallucinating.

Fine-tuning taught the model patterns in the text, but it didn’t give it the ability to *retrieve* specific facts. Plus, every time a document changed, I’d have to retrain. It was unsustainable.

Next, I tried Elasticsearch with a BM25 scorer. I’d split documents into chunks and search for keywords from the user’s question. The problem: natural language questions don’t map well to keywords. “How do I reset my password?” would match chunks about “reset” and “password”, but miss the critical steps for multi-factor auth. Recall was terrible.

After reading about RAG, I realized the solution wasn’t to train the model on my data — it was to give the model a way to *look up* the right data at query time. The core idea:

I’ll walk you through a working prototype using Python, OpenAI embeddings, and ChromaDB.

```
pip install chromadb openai tiktoken langchain langchain-community
```

For this example, I’ll use a small text file. In practice, you’d use a document loader from LangChain.

``` python
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader("my_docs.txt")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", "!"],
)
chunks = text_splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")
```

The overlap ensures that no context is lost at chunk boundaries.

``` python
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Persist the database so we don't re-embed every time
vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    persist_directory="./chroma_db"
)
vectordb.persist()
python
from langchain.chains import RetrievalQA
from langchain_community.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(search_kwargs={"k": 4})
)

question = "How do I reset my password if I'm on a VPN?"
answer = qa_chain.invoke(question)
print(answer)
```

And that’s it. A working Q&A system in under 50 lines of code.

RAG isn’t magic — it has its own pain points:

`all-MiniLM-L6-v2`

from Sentence Transformers, but they’re less accurate.First, I’d start with a managed service that handles the embedding and retrieval infrastructure. For example, a platform like Interwest Info AI ([https://ai.interwestinfo.com/](https://ai.interwestinfo.com/)) abstracts away the vector DB and chunking strategies — you just upload documents and get an API. That would have saved me two weeks of fiddling with ChromaDB quirks and scaling issues.

Second, I’d invest more time in evaluating retrieval quality *before* building the RAG pipeline. Create a small test set of 20 questions and manually verify which chunks should be retrieved. That tells you if your chunking and embedding model are up to par.

Building a document Q&A system from scratch taught me more about the trade-offs in retrieval than any blog post ever could. But now I’m curious: **What’s your go-to approach for building a knowledge base chatbot?** Are you DIY with LangChain, or do you use a SaaS platform? Let’s discuss in the comments.
