Local RAG: Chat With Your Documents (Open Source, Private)

Here is a factual summary of the article:

The article explains how to set up a local, private Retrieval-Augmented Generation (RAG) system, which allows users to upload documents like PDFs and code files and then ask a local LLM questions about them without any data leaving their machine. It outlines three implementation options: using the built-in RAG feature in Open WebUI, installing the more feature-rich AnythingLLM application, or building a custom RAG pipeline with Python and LangChain. The guide emphasizes that RAG functions like an "open-book exam" for the LLM, enabling it to retrieve and cite specific information from user-provided documents for more accurate answers.

Local RAG: Chat With Your Documents Upload PDFs, code, research papers, or entire books — then ask your local LLM questions about them. No data ever leaves your machine. What Is RAG? Plain English RAG Retrieval-Augmented Generation means your LLM can look up information from your own documents before answering. Think of it like this: - Normal LLM: Has a great memory, but only knows what it learned during training - RAG: The LLM gets a "cheat sheet" — your documents — that it can read before answering 💡 Analogy:Without RAG, the LLM is like a student taking a closed-book exam. With RAG, they get an open-book exam — and you get to write the book. Real-World Uses | Use Case | What You Upload | What You Can Ask | |---|---|---| | Research | PDF papers, articles | "What were the key findings in this study?" | | Studying | Textbooks, lecture notes | "Explain chapter 7 in simpler terms" | | Work | Company docs, reports | "What's our Q3 strategy?" | | Legal | Contracts, agreements | "What are the termination clauses?" | | Coding | Codebase, documentation | "How does the auth module work?" | | Personal | Journals, notes, books | "What did I write about in March?" | Option A: Built-in RAG in Open WebUI Simplest If you already have Open WebUI installed https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/open-webui-setup.md , RAG is built-in. How to Use It - Open in your browser http://localhost:3000 http://localhost:3000 - Click the paperclip icon next to the chat input - Upload a PDF, .txt, .docx, or .md file - Wait for the "embedding" process to finish usually 10-30 seconds - Ask questions about the document That's it. No configuration needed. Pro Tips - Multiple documents: You can upload several files at once. Open WebUI indexes them all. - Model choice: Use qwen3.6:27b or deepseek-r1:14b for best RAG quality — they have larger context windows. - Document size: Open WebUI handles documents up to hundreds of pages. For very large documents, consider chunking them. Option B: AnythingLLM More Powerful AnythingLLM https://anythingllm.com is a dedicated RAG application with more features than Open WebUI's built-in system. Installation With Docker Recommended : docker run -d \ -p 3001:3001 \ -v anythingllm:/app/server/storage \ -e STORAGE DIR=/app/server/storage \ --name anythingllm \ --restart always \ ghcr.io/anythingllm/anything-llm:latest Then open http://localhost:3001 . Without Docker: Download from anythingllm.com https://anythingllm.com and run the installer for your OS. Configuration - Open AnythingLLM at http://localhost:3001 - Create an admin account local only — no data leaves your machine Go to Settings → LLM Provider - Select Ollama from the dropdown - Choose your model e.g., qwen2.5:7b or deepseek-r1:14b Click Save Now set up embeddings: Go to Settings → Embedding Provider Select Ollama - Choose an embedding model AnythingLLM will download a small embedding model — about 500 MB Click Save Uploading Documents - Click "New Workspace" and give it a name e.g., "Research Papers" - Click the upload icon or drag and drop files - Supported formats: PDF, DOCX, TXT, MD, CSV, JSON, code files - Click "Save and Embed" - Wait for indexing progress shows in the UI Chatting With Your Documents Once embedded, just type your question: "What are the three main conclusions from these papers?" AnythingLLM searches your documents for relevant passages and feeds them to the LLM along with your question. The result is accurate, sourced answers — not guesses. 🔥 Pro tip:AnythingLLM shows you which document each answer came from. Hover over the citation to see the exact source passage. Option C: Manual RAG with LangChain For Developers For maximum control, build RAG with Python and LangChain. This is particularly useful if you want to automate document processing. Setup pip install langchain langchain-ollama chromadb Basic RAG Script python from langchain ollama import ChatOllama, OllamaEmbeddings from langchain community.document loaders import DirectoryLoader, TextLoader from langchain.text splitter import RecursiveCharacterTextSplitter from langchain community.vectorstores import Chroma from langchain.chains import RetrievalQA 1. Load your documents loader = DirectoryLoader "./my-docs/", glob=" / .txt", loader cls=TextLoader documents = loader.load print f"Loaded {len documents } documents" 2. Split into chunks splitter = RecursiveCharacterTextSplitter chunk size=1000, chunk overlap=200 chunks = splitter.split documents documents print f"Split into {len chunks } chunks" 3. Create embeddings and vector store embeddings = OllamaEmbeddings model="qwen2.5:7b" vectorstore = Chroma.from documents documents=chunks, embedding=embeddings, persist directory="./chroma db" 4. Create RAG chain llm = ChatOllama model="qwen2.5:7b", temperature=0.3 qa chain = RetrievalQA.from chain type llm=llm, chain type="stuff", retriever=vectorstore.as retriever search kwargs={"k": 3} 5. Ask questions while True: question = input "\nAsk a question or 'quit' : " if question.lower == 'quit': break answer = qa chain.invoke question print f"\nAnswer: {answer 'result' }" Run It Put your documents in a folder called "my-docs/" mkdir -p my-docs Copy your PDFs/txts there Run the script python rag.py Choosing the Right RAG Setup | Factor | Open WebUI RAG | AnythingLLM | LangChain | |---|---|---|---| Setup time | 1 click | 5 minutes | 30 minutes | Features | Basic | Advanced | Full control | Document types | PDF, TXT, MD | PDF, DOCX, TXT, MD, CSV, code | Anything with a loader | Multi-document | ✅ | ✅ | ✅ | Citations | ❌ | ✅ | ✅ manual | Customization | Low | Medium | High | Best for | Quick personal use | Serious knowledge work | Automation & production | My recommendation: - Start with Open WebUI's built-in RAG fastest - Move to AnythingLLM when you need citations and multiple workspaces - Use LangChain when you need to automate document processing Best Practices for Better RAG Results 1. Use the Right Model RAG works best with models that have large context windows : | Model | Context | Why It's Good for RAG | |---|---|---| | Qwen 3.6:27B | 262K | Can process entire chapters at once | | Qwen 2.5:14B | 128K | Excellent balance of quality and speed | | DeepSeek-R1:14B | 128K | Best for reasoning about documents | | DeepSeek-R1:32B | 128K | Best overall RAG quality | 2. Write Good Questions | ❌ Bad Question | ✅ Good Question | |---|---| | "Tell me about it" | "Summarize the methodology used in section 3" | | "What's in this?" | "What are the three main arguments presented in chapter 2?" | | "Is this useful?" | "What evidence does the author provide for their claim on page 15?" | 3. Optimize Chunk Size The chunk size determines how much text the LLM sees at once: | Chunk Size | Best For | |---|---| | 500 chars | Short lookup questions "What is X?" | | 1000 chars | General Q&A 🟢 Default | | 2000 chars | Summarization tasks | | 4000+ chars | Long-context analysis Qwen 3.6 recommended | Common Pitfalls | Problem | Cause | Fix | |---|---|---| | "I don't know" to document questions | Embedding not matching | Re-save documents in workspace | | Wrong answers despite having docs | Chunk size too small | Increase chunk size to 2000+ | | Very slow document processing | Large files on CPU | Be patient — first embed takes longest | | "Model not responding" | Context overflow | Use a model with larger context Qwen 3.6 | | Can't upload PDFs | PDF is scanned/image-based | Use OCR first tools like marker-pdf | Next Steps - Set up Open WebUI first it includes RAG out of the box → Open WebUI Guide https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/open-webui-setup.md - Try it with Chinese models → Qwen 3.6 is excellent for RAG due to its 262K context - Combine RAG with Function Calling → Chapter 06: Function Calling https://github.com/Lingdas1/local-llm-guide/tree/main/06-function-calling/ - Deploy in production → Chapter 05: Production https://github.com/Lingdas1/local-llm-guide/tree/main/05-production/ Part of the Local LLM Guide — the definitive resource for running AI on your own hardware.