{"slug": "why-i-stopped-building-my-own-document-q-a-from-scratch", "title": "Why I Stopped Building My Own Document Q&A from Scratch", "summary": "A developer abandoned building a custom document Q&A system after struggling with fine-tuning and keyword search, then built a working prototype using Retrieval-Augmented Generation (RAG) with OpenAI embeddings and ChromaDB in under 50 lines of code. The engineer recommends starting with a managed service like Interwest Info AI and evaluating retrieval quality early to avoid common pitfalls.", "body_md": "Two months ago, I was knee-deep in a project that sounded simple: build a system that could answer questions from our company’s internal documentation. We had hundreds of PDFs, Confluence pages, and READMEs. The goal was to let junior developers ask natural language questions and get accurate answers instantly.\n\nI thought, “How hard can it be? I’ll just fine-tune a small LLM on our documents.”\n\nSpoiler: it was that hard, and then some.\n\nI spent two weeks collecting, cleaning, and chunking our documentation. I wrote a Hugging Face training script, rented a GPU, and fine-tuned a 7B parameter model. The result? A model that could recite our API docs verbatim but couldn’t answer a question like “Why does our auth flow fail for expired tokens?” without hallucinating.\n\nFine-tuning taught the model patterns in the text, but it didn’t give it the ability to *retrieve* specific facts. Plus, every time a document changed, I’d have to retrain. It was unsustainable.\n\nNext, I tried Elasticsearch with a BM25 scorer. I’d split documents into chunks and search for keywords from the user’s question. The problem: natural language questions don’t map well to keywords. “How do I reset my password?” would match chunks about “reset” and “password”, but miss the critical steps for multi-factor auth. Recall was terrible.\n\nAfter reading about RAG, I realized the solution wasn’t to train the model on my data — it was to give the model a way to *look up* the right data at query time. The core idea:\n\nI’ll walk you through a working prototype using Python, OpenAI embeddings, and ChromaDB.\n\n```\npip install chromadb openai tiktoken langchain langchain-community\n```\n\nFor this example, I’ll use a small text file. In practice, you’d use a document loader from LangChain.\n\n``` python\nfrom langchain_community.document_loaders import TextLoader\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\n\nloader = TextLoader(\"my_docs.txt\")\ndocuments = loader.load()\n\ntext_splitter = RecursiveCharacterTextSplitter(\n    chunk_size=500,\n    chunk_overlap=50,\n    separators=[\"\\n\\n\", \"\\n\", \".\", \"!\"],\n)\nchunks = text_splitter.split_documents(documents)\nprint(f\"Created {len(chunks)} chunks\")\n```\n\nThe overlap ensures that no context is lost at chunk boundaries.\n\n``` python\nfrom langchain_community.embeddings import OpenAIEmbeddings\nfrom langchain_community.vectorstores import Chroma\n\nembedding_model = OpenAIEmbeddings(model=\"text-embedding-ada-002\")\n\n# Persist the database so we don't re-embed every time\nvectordb = Chroma.from_documents(\n    documents=chunks,\n    embedding=embedding_model,\n    persist_directory=\"./chroma_db\"\n)\nvectordb.persist()\npython\nfrom langchain.chains import RetrievalQA\nfrom langchain_community.chat_models import ChatOpenAI\n\nllm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)\nqa_chain = RetrievalQA.from_chain_type(\n    llm=llm,\n    chain_type=\"stuff\",\n    retriever=vectordb.as_retriever(search_kwargs={\"k\": 4})\n)\n\nquestion = \"How do I reset my password if I'm on a VPN?\"\nanswer = qa_chain.invoke(question)\nprint(answer)\n```\n\nAnd that’s it. A working Q&A system in under 50 lines of code.\n\nRAG isn’t magic — it has its own pain points:\n\n`all-MiniLM-L6-v2`\n\nfrom Sentence Transformers, but they’re less accurate.First, I’d start with a managed service that handles the embedding and retrieval infrastructure. For example, a platform like Interwest Info AI ([https://ai.interwestinfo.com/](https://ai.interwestinfo.com/)) abstracts away the vector DB and chunking strategies — you just upload documents and get an API. That would have saved me two weeks of fiddling with ChromaDB quirks and scaling issues.\n\nSecond, I’d invest more time in evaluating retrieval quality *before* building the RAG pipeline. Create a small test set of 20 questions and manually verify which chunks should be retrieved. That tells you if your chunking and embedding model are up to par.\n\nBuilding a document Q&A system from scratch taught me more about the trade-offs in retrieval than any blog post ever could. But now I’m curious: **What’s your go-to approach for building a knowledge base chatbot?** Are you DIY with LangChain, or do you use a SaaS platform? Let’s discuss in the comments.", "url": "https://wpnews.pro/news/why-i-stopped-building-my-own-document-q-a-from-scratch", "canonical_source": "https://dev.to/__c1b9e06dc90a7e0a676b/why-i-stopped-building-my-own-document-qa-from-scratch-19g5", "published_at": "2026-06-13 08:01:09+00:00", "updated_at": "2026-06-13 08:17:41.742413+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-tools", "developer-tools", "generative-ai"], "entities": ["OpenAI", "ChromaDB", "LangChain", "Hugging Face", "Elasticsearch", "Interwest Info AI", "Sentence Transformers", "GPT-4o-mini"], "alternates": {"html": "https://wpnews.pro/news/why-i-stopped-building-my-own-document-q-a-from-scratch", "markdown": "https://wpnews.pro/news/why-i-stopped-building-my-own-document-q-a-from-scratch.md", "text": "https://wpnews.pro/news/why-i-stopped-building-my-own-document-q-a-from-scratch.txt", "jsonld": "https://wpnews.pro/news/why-i-stopped-building-my-own-document-q-a-from-scratch.jsonld"}}