{"slug": "local-rag-chat-with-your-documents-open-source-private", "title": "Local RAG: Chat With Your Documents (Open Source, Private)", "summary": "Here is a factual summary of the article:\n\nThe article explains how to set up a local, private Retrieval-Augmented Generation (RAG) system, which allows users to upload documents like PDFs and code files and then ask a local LLM questions about them without any data leaving their machine. It outlines three implementation options: using the built-in RAG feature in Open WebUI, installing the more feature-rich AnythingLLM application, or building a custom RAG pipeline with Python and LangChain. The guide emphasizes that RAG functions like an \"open-book exam\" for the LLM, enabling it to retrieve and cite specific information from user-provided documents for more accurate answers.", "body_md": "# Local RAG: Chat With Your Documents\n\nUpload PDFs, code, research papers, or entire books — then ask your local LLM questions about them. No data ever leaves your machine.\n\n## What Is RAG? (Plain English)\n\n**RAG** (Retrieval-Augmented Generation) means your LLM can look up information from your own documents before answering.\n\nThink of it like this:\n\n-\n**Normal LLM:** Has a great memory, but only knows what it learned during training -\n**RAG:** The LLM gets a \"cheat sheet\" — your documents — that it can read before answering\n\n💡\n\nAnalogy:Without RAG, the LLM is like a student taking a closed-book exam. With RAG, they get an open-book exam — and you get to write the book.\n\n### Real-World Uses\n\n| Use Case | What You Upload | What You Can Ask |\n|---|---|---|\n| Research | PDF papers, articles | \"What were the key findings in this study?\" |\n| Studying | Textbooks, lecture notes | \"Explain chapter 7 in simpler terms\" |\n| Work | Company docs, reports | \"What's our Q3 strategy?\" |\n| Legal | Contracts, agreements | \"What are the termination clauses?\" |\n| Coding | Codebase, documentation | \"How does the auth module work?\" |\n| Personal | Journals, notes, books | \"What did I write about in March?\" |\n\n## Option A: Built-in RAG in Open WebUI (Simplest)\n\nIf you already have [Open WebUI installed](https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/open-webui-setup.md), RAG is built-in.\n\n### How to Use It\n\n- Open\nin your browser[http://localhost:3000](http://localhost:3000) - Click the\n**paperclip icon** next to the chat input - Upload a PDF, .txt, .docx, or .md file\n- Wait for the \"embedding\" process to finish (usually 10-30 seconds)\n- Ask questions about the document\n\n**That's it.** No configuration needed.\n\n### Pro Tips\n\n-\n**Multiple documents:** You can upload several files at once. Open WebUI indexes them all. -\n**Model choice:** Use`qwen3.6:27b`\n\nor`deepseek-r1:14b`\n\nfor best RAG quality — they have larger context windows. -\n**Document size:** Open WebUI handles documents up to hundreds of pages. For very large documents, consider chunking them.\n\n## Option B: AnythingLLM (More Powerful)\n\n[AnythingLLM](https://anythingllm.com) is a dedicated RAG application with more features than Open WebUI's built-in system.\n\n### Installation\n\n**With Docker (Recommended):**\n\n```\ndocker run -d \\\n  -p 3001:3001 \\\n  -v anythingllm:/app/server/storage \\\n  -e STORAGE_DIR=/app/server/storage \\\n  --name anythingllm \\\n  --restart always \\\n  ghcr.io/anythingllm/anything-llm:latest\n```\n\nThen open ** http://localhost:3001**.\n\n**Without Docker:**\n\nDownload from [anythingllm.com](https://anythingllm.com) and run the installer for your OS.\n\n### Configuration\n\n-\n**Open AnythingLLM** at`http://localhost:3001`\n\n-\n**Create an admin account**(local only — no data leaves your machine) **Go to Settings → LLM Provider**-\n**Select Ollama** from the dropdown -\n**Choose your model**(e.g.,`qwen2.5:7b`\n\nor`deepseek-r1:14b`\n\n) **Click Save**\n\nNow set up embeddings:\n\n**Go to Settings → Embedding Provider****Select Ollama**-\n**Choose an embedding model**(AnythingLLM will download a small embedding model — about 500 MB) **Click Save**\n\n### Uploading Documents\n\n- Click\n**\"New Workspace\"** and give it a name (e.g., \"Research Papers\") - Click the\n**upload icon**(or drag and drop files) - Supported formats: PDF, DOCX, TXT, MD, CSV, JSON, code files\n- Click\n**\"Save and Embed\"** - Wait for indexing (progress shows in the UI)\n\n### Chatting With Your Documents\n\nOnce embedded, just type your question:\n\n```\n\"What are the three main conclusions from these papers?\"\n```\n\nAnythingLLM searches your documents for relevant passages and feeds them to the LLM along with your question. The result is **accurate, sourced answers** — not guesses.\n\n🔥\n\nPro tip:AnythingLLM shows you which document each answer came from. Hover over the citation to see the exact source passage.\n\n## Option C: Manual RAG with LangChain (For Developers)\n\nFor maximum control, build RAG with Python and LangChain. This is particularly useful if you want to automate document processing.\n\n### Setup\n\n```\npip install langchain langchain-ollama chromadb\n```\n\n### Basic RAG Script\n\n``` python\nfrom langchain_ollama import ChatOllama, OllamaEmbeddings\nfrom langchain_community.document_loaders import DirectoryLoader, TextLoader\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\nfrom langchain_community.vectorstores import Chroma\nfrom langchain.chains import RetrievalQA\n\n# 1. Load your documents\nloader = DirectoryLoader(\"./my-docs/\", glob=\"**/*.txt\", loader_cls=TextLoader)\ndocuments = loader.load()\nprint(f\"Loaded {len(documents)} documents\")\n\n# 2. Split into chunks\nsplitter = RecursiveCharacterTextSplitter(\n    chunk_size=1000,\n    chunk_overlap=200\n)\nchunks = splitter.split_documents(documents)\nprint(f\"Split into {len(chunks)} chunks\")\n\n# 3. Create embeddings and vector store\nembeddings = OllamaEmbeddings(model=\"qwen2.5:7b\")\nvectorstore = Chroma.from_documents(\n    documents=chunks,\n    embedding=embeddings,\n    persist_directory=\"./chroma_db\"\n)\n\n# 4. Create RAG chain\nllm = ChatOllama(model=\"qwen2.5:7b\", temperature=0.3)\nqa_chain = RetrievalQA.from_chain_type(\n    llm=llm,\n    chain_type=\"stuff\",\n    retriever=vectorstore.as_retriever(search_kwargs={\"k\": 3})\n)\n\n# 5. Ask questions\nwhile True:\n    question = input(\"\\nAsk a question (or 'quit'): \")\n    if question.lower() == 'quit':\n        break\n    answer = qa_chain.invoke(question)\n    print(f\"\\nAnswer: {answer['result']}\")\n```\n\n### Run It\n\n```\n# Put your documents in a folder called \"my-docs/\"\nmkdir -p my-docs\n# Copy your PDFs/txts there\n\n# Run the script\npython rag.py\n```\n\n## Choosing the Right RAG Setup\n\n| Factor | Open WebUI RAG | AnythingLLM | LangChain |\n|---|---|---|---|\nSetup time |\n1 click | 5 minutes | 30 minutes |\nFeatures |\nBasic | Advanced | Full control |\nDocument types |\nPDF, TXT, MD | PDF, DOCX, TXT, MD, CSV, code | Anything with a loader |\nMulti-document |\n✅ | ✅ | ✅ |\nCitations |\n❌ | ✅ | ✅ (manual) |\nCustomization |\nLow | Medium | High |\nBest for |\nQuick personal use | Serious knowledge work | Automation & production |\n\n**My recommendation:**\n\n-\n**Start** with Open WebUI's built-in RAG (fastest) -\n**Move to** AnythingLLM when you need citations and multiple workspaces -\n**Use** LangChain when you need to automate document processing\n\n## Best Practices for Better RAG Results\n\n### 1. Use the Right Model\n\nRAG works best with models that have **large context windows**:\n\n| Model | Context | Why It's Good for RAG |\n|---|---|---|\n| Qwen 3.6:27B | 262K | Can process entire chapters at once |\n| Qwen 2.5:14B | 128K | Excellent balance of quality and speed |\n| DeepSeek-R1:14B | 128K | Best for reasoning about documents |\n| DeepSeek-R1:32B | 128K | Best overall RAG quality |\n\n### 2. Write Good Questions\n\n| ❌ Bad Question | ✅ Good Question |\n|---|---|\n| \"Tell me about it\" | \"Summarize the methodology used in section 3\" |\n| \"What's in this?\" | \"What are the three main arguments presented in chapter 2?\" |\n| \"Is this useful?\" | \"What evidence does the author provide for their claim on page 15?\" |\n\n### 3. Optimize Chunk Size\n\nThe chunk size determines how much text the LLM sees at once:\n\n| Chunk Size | Best For |\n|---|---|\n| 500 chars | Short lookup questions (\"What is X?\") |\n| 1000 chars | General Q&A 🟢 Default |\n| 2000 chars | Summarization tasks |\n| 4000+ chars | Long-context analysis (Qwen 3.6 recommended) |\n\n## Common Pitfalls\n\n| Problem | Cause | Fix |\n|---|---|---|\n| \"I don't know\" to document questions | Embedding not matching | Re-save documents in workspace |\n| Wrong answers despite having docs | Chunk size too small | Increase chunk_size to 2000+ |\n| Very slow document processing | Large files on CPU | Be patient — first embed takes longest |\n| \"Model not responding\" | Context overflow | Use a model with larger context (Qwen 3.6) |\n| Can't upload PDFs | PDF is scanned/image-based | Use OCR first (tools like marker-pdf) |\n\n## Next Steps\n\n-\n**Set up Open WebUI first**(it includes RAG out of the box) →[Open WebUI Guide](https://github.com/Lingdas1/local-llm-guide/blob/main/04-advanced-usage/open-webui-setup.md) -\n**Try it with Chinese models**→ Qwen 3.6 is excellent for RAG due to its 262K context -\n**Combine RAG with Function Calling**→[Chapter 06: Function Calling](https://github.com/Lingdas1/local-llm-guide/tree/main/06-function-calling/) -\n**Deploy in production**→[Chapter 05: Production](https://github.com/Lingdas1/local-llm-guide/tree/main/05-production/)\n\nPart of the[Local LLM Guide]— the definitive resource for running AI on your own hardware.", "url": "https://wpnews.pro/news/local-rag-chat-with-your-documents-open-source-private", "canonical_source": "https://dev.to/lingdas1/local-rag-chat-with-your-documents-open-source-private-390o", "published_at": "2026-05-23 18:49:41+00:00", "updated_at": "2026-05-23 19:03:45.262632+00:00", "lang": "en", "topics": ["large-language-models", "open-source", "developer-tools", "artificial-intelligence", "products"], "entities": ["Open WebUI", "AnythingLLM", "qwen3.6:27b", "deepseek-r1:14b"], "alternates": {"html": "https://wpnews.pro/news/local-rag-chat-with-your-documents-open-source-private", "markdown": "https://wpnews.pro/news/local-rag-chat-with-your-documents-open-source-private.md", "text": "https://wpnews.pro/news/local-rag-chat-with-your-documents-open-source-private.txt", "jsonld": "https://wpnews.pro/news/local-rag-chat-with-your-documents-open-source-private.jsonld"}}