Why I Stopped Building My Own Document Q&A from Scratch

wpnews.pro

cd /news/large-language-models/why-i-stopped-building-my-own-docume… · home › topics › large-language-models › article

[ARTICLE · art-25992] src=dev.to ↗ pub=2026-06-13T08:01Z topic=large-language-models verified=true sentiment=· neutral

Why I Stopped Building My Own Document Q&A from Scratch

A developer abandoned building a custom document Q&A system after struggling with fine-tuning and keyword search, then built a working prototype using Retrieval-Augmented Generation (RAG) with OpenAI embeddings and ChromaDB in under 50 lines of code. The engineer recommends starting with a managed service like Interwest Info AI and evaluating retrieval quality early to avoid common pitfalls.

read3 min views21 publishedJun 13, 2026

Two months ago, I was knee-deep in a project that sounded simple: build a system that could answer questions from our company’s internal documentation. We had hundreds of PDFs, Confluence pages, and READMEs. The goal was to let junior developers ask natural language questions and get accurate answers instantly.

I thought, “How hard can it be? I’ll just fine-tune a small LLM on our documents.”

Spoiler: it was that hard, and then some.

I spent two weeks collecting, cleaning, and chunking our documentation. I wrote a Hugging Face training script, rented a GPU, and fine-tuned a 7B parameter model. The result? A model that could recite our API docs verbatim but couldn’t answer a question like “Why does our auth flow fail for expired tokens?” without hallucinating.

Fine-tuning taught the model patterns in the text, but it didn’t give it the ability to retrieve specific facts. Plus, every time a document changed, I’d have to retrain. It was unsustainable.

Next, I tried Elasticsearch with a BM25 scorer. I’d split documents into chunks and search for keywords from the user’s question. The problem: natural language questions don’t map well to keywords. “How do I reset my password?” would match chunks about “reset” and “password”, but miss the critical steps for multi-factor auth. Recall was terrible.

After reading about RAG, I realized the solution wasn’t to train the model on my data — it was to give the model a way to look up the right data at query time. The core idea:

I’ll walk you through a working prototype using Python, OpenAI embeddings, and ChromaDB.

pip install chromadb openai tiktoken langchain langchain-community

For this example, I’ll use a small text file. In practice, you’d use a document from LangChain.

from langchain_community.document_s import Text
from langchain.text_splitter import RecursiveCharacterTextSplitter

 = Text("my_docs.txt")
documents = .load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", "!"],
)
chunks = text_splitter.split_documents(documents)
print(f"Created {len(chunks)} chunks")

The overlap ensures that no context is lost at chunk boundaries.

from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

vectordb = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    persist_directory="./chroma_db"
)
vectordb.persist()
python
from langchain.chains import RetrievalQA
from langchain_community.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(search_kwargs={"k": 4})
)

question = "How do I reset my password if I'm on a VPN?"
answer = qa_chain.invoke(question)
print(answer)

And that’s it. A working Q&A system in under 50 lines of code.

RAG isn’t magic — it has its own pain points:

all-MiniLM-L6-v2

from Sentence Transformers, but they’re less accurate.First, I’d start with a managed service that handles the embedding and retrieval infrastructure. For example, a platform like Interwest Info AI (https://ai.interwestinfo.com/) abstracts away the vector DB and chunking strategies — you just upload documents and get an API. That would have saved me two weeks of fiddling with ChromaDB quirks and scaling issues.

Second, I’d invest more time in evaluating retrieval quality before building the RAG pipeline. Create a small test set of 20 questions and manually verify which chunks should be retrieved. That tells you if your chunking and embedding model are up to par.

Building a document Q&A system from scratch taught me more about the trade-offs in retrieval than any blog post ever could. But now I’m curious: What’s your go-to approach for building a knowledge base chatbot? Are you DIY with LangChain, or do you use a SaaS platform? Let’s discuss in the comments.

source & further reading

dev.to — original article Claude Code SEO Workflow: Assessing a Reported Content Update Result Translating a product catalog with an LLM: cache keys and guard rails Entry-Level Data Engineering Is Gone. Here's the Proof.

~/api · this article 200

$curl api.wpnews.pro/v1/news/why-i-stopped-building-m…

Read original on dev.to → dev.to/__c1b9e06dc90a7e0a676b/why-i-stopped-buil…

mentioned entities

OpenAI

ChromaDB

LangChain

Hugging Face

Elasticsearch

Interwest Info AI

Sentence Transformers

GPT-4o-mini

metadata

slugwhy-i-stopped-building-my-own-document-q-a-from-scratch

topic#large-language-models

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prev"Retiring" the Nikon D7200

next →I Built an AI Agent That Writes …

── more in #large-language-models 4 stories · sorted by recency

github.com · 30 Jul · #large-language-models

BeHive – Open-source research engine that outputs structured claims, not essays

wired.com · 30 Jul · #large-language-models

OpenAI’s Hacking Debacle Was a Human Mistake

pub.towardsai.net · 30 Jul · #large-language-models

Claude vs GPT-5.6 vs Gemini vs DeepSeek Prompt Caching: I Turned It Off and Saved 20%

blog.stackademic.com · 30 Jul · #large-language-models

The Hackathon Issue

── more on @openai 3 stories trending now

wpnews · 29 Jul · #ai-safety

News Summary for July 29, 2026

wpnews · 28 Jul · #large-language-models

How to Download and Run Kimi K3 Open Weights

wpnews · 29 Jul · #ai-agents

Compliance-Ready AI Agents: Logging and Tracing Every MCP Tool Call with Bifrost

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required