{"slug": "i-built-a-q-a-bot-for-my-docs-and-almost-gave-up-here-s-what-worked", "title": "I Built a Q&A Bot for My Docs and Almost Gave Up (Here's What Worked)", "summary": "A developer built a Retrieval-Augmented Generation (RAG) pipeline for a documentation Q&A bot after multiple failed attempts, including token limits, high costs, and hallucination issues with direct LLM approaches. The final solution separates retrieval from generation, using a fast embedding model to find relevant document chunks before feeding them to an LLM for answers. The developer implemented the system in 20 lines of Python code using LangChain, Chroma for vector storage, and HuggingFace embeddings.", "body_md": "A few months ago, I decided to build a Q&A bot for my project’s documentation. You know the dream: users type a question, and the bot answers instantly from the docs. No more digging through pages. No more stale FAQs.\n\nI thought it would be straightforward. Slap an LLM on top of a text file and call it a day. Oh, how wrong I was.\n\nI had a bunch of Markdown files – about 50 pages of setup guides, API references, and troubleshooting. I wanted the bot to answer questions like “How do I configure authentication?” or “What’s the maximum payload size?”\n\nMy first attempt: dump the entire documentation into a single prompt and ask GPT-4 to answer. It worked… for the first two questions. Then I hit the token limit. Then I realized I was spending $0.50 per query. Then I noticed the model hallucinating answers from unrelated sections.\n\nI needed a smarter approach. But every tutorial I found either oversimplified (“just use LangChain!”) or assumed I had a PhD in information retrieval.\n\nI spent a weekend preparing a dataset of question-answer pairs from my docs. Fine-tuned a small LLaMA model. The result? It memorized exact phrases but couldn’t generalize to rephrased questions. Also, updating the docs meant retraining. Hard pass.\n\nI embedded all the doc chunks, stored them in Pinecone, and returned the top-3 chunks as the answer. Users got a wall of text. No summarization. No conversation. It felt like Google without the ranking.\n\nI tried to dynamically select relevant chunks and inject them into a prompt. But I kept running into context window issues. Plus, the model would sometimes ignore the provided context and make stuff up.\n\nAfter three weeks of trial and error, I settled on a Retrieval-Augmented Generation (RAG) pipeline. The key insight: **separate retrieval from generation**. Use a fast, cheap retriever to find relevant chunks, then feed only those chunks to an LLM for the final answer.\n\nHere’s the architecture:\n\nI tried several LLM providers for the generation step: OpenAI, Anthropic, and a smaller self-hosted model. Eventually I settled on a paid API because the quality difference was huge for my use case. (I used [Interwest’s AI](https://ai.interwestinfo.com/) as one of the providers during testing – it worked fine, but any compatible API would do.)\n\nHere’s the Python script I ended up with. It uses `langchain`\n\nfor orchestration, but you could swap out components.\n\n``` python\nimport os\nfrom langchain.document_loaders import DirectoryLoader\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\nfrom langchain.embeddings import HuggingFaceEmbeddings\nfrom langchain.vectorstores import Chroma\nfrom langchain.llms import OpenAI  # or any other LLM\nfrom langchain.chains import RetrievalQA\n\n# 1. Load documents\nloader = DirectoryLoader(\"./docs/\", glob=\"**/*.md\")\ndocs = loader.load()\n\n# 2. Split into chunks\ntext_splitter = RecursiveCharacterTextSplitter(\n    chunk_size=500,\n    chunk_overlap=50\n)\nchunks = text_splitter.split_documents(docs)\n\n# 3. Create embeddings and vector store\nembeddings = HuggingFaceEmbeddings(model_name=\"all-MiniLM-L6-v2\")\nvectordb = Chroma.from_documents(chunks, embeddings, persist_directory=\"./chroma_db\")\nvectordb.persist()\n\n# 4. Set up the QA chain\nllm = OpenAI(temperature=0, model=\"gpt-3.5-turbo\")  # or use Interwest AI API\nqa_chain = RetrievalQA.from_chain_type(\n    llm=llm,\n    chain_type=\"stuff\",\n    retriever=vectordb.as_retriever(search_kwargs={\"k\": 3}),\n    return_source_documents=True\n)\n\n# 5. Ask a question\nquery = \"How do I reset my password?\"\nresult = qa_chain({\"query\": query})\nprint(result[\"result\"])\n```\n\nThat’s it. 20 lines of real code that actually works.\n\n`all-MiniLM-L6-v2`\n\nis fast and free. But for domain-specific docs (e.g., medical, legal), you might need a fine-tuned embedding model.I’d start with a simple retrieval-only system (just return the top chunks) and add the LLM only after validating that the retrieval works. I wasted time tuning the generation when my retrieval was bad.\n\nAlso, I’d add logging from day one. I had no idea which queries failed until users complained. A simple CSV log of queries, retrieved chunks, and answers would have saved me hours.\n\nBuilding a Q&A bot for your own docs is one of those projects that sounds trivial but hides a dozen gotchas. The RAG approach worked for me, but I’m sure there are better ways. What’s your setup look like? Do you use a managed service, or roll your own? I’d love to hear what broke for you.", "url": "https://wpnews.pro/news/i-built-a-q-a-bot-for-my-docs-and-almost-gave-up-here-s-what-worked", "canonical_source": "https://dev.to/__c1b9e06dc90a7e0a676b/i-built-a-qa-bot-for-my-docs-and-almost-gave-up-heres-what-worked-1kgj", "published_at": "2026-05-30 02:01:00+00:00", "updated_at": "2026-05-30 02:11:26.893940+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-tools", "ai-products", "ai-infrastructure"], "entities": ["GPT-4", "LangChain", "LLaMA", "Pinecone", "Google"], "alternates": {"html": "https://wpnews.pro/news/i-built-a-q-a-bot-for-my-docs-and-almost-gave-up-here-s-what-worked", "markdown": "https://wpnews.pro/news/i-built-a-q-a-bot-for-my-docs-and-almost-gave-up-here-s-what-worked.md", "text": "https://wpnews.pro/news/i-built-a-q-a-bot-for-my-docs-and-almost-gave-up-here-s-what-worked.txt", "jsonld": "https://wpnews.pro/news/i-built-a-q-a-bot-for-my-docs-and-almost-gave-up-here-s-what-worked.jsonld"}}