{"slug": "rag-explained-retrieve-then-answer-the-prompt-that-kills-hallucinations", "title": "RAG Explained: Retrieve, Then Answer (the Prompt That Kills Hallucinations)", "summary": "A developer explains that RAG (Retrieval-Augmented Generation) reduces LLM hallucinations by fetching relevant document chunks at query time and instructing the model to answer using only that context. The technique involves embedding the question, performing a vector search to retrieve the top chunks, and constructing a prompt that forces the model to rely solely on the provided context. The developer shares a simple template and notes that the key is including the phrase 'ONLY the context' to prevent the model from blending in its own memory.", "body_md": "An LLM only knows what it saw in training. It doesn't know your company wiki, last week's news, or the PDF you just uploaded. Ask it anyway and it either refuses or — worse — confidently makes something up.\n\n**RAG** (Retrieval-Augmented Generation) fixes that, and it's far simpler than the name suggests. This is Day 5 of my PromptFromZero series.\n\nFetch the relevant facts at question time, and hand them to the model to read.\n\nYou're not asking the model to *remember*. You're giving it the page to *read*.\n\nEmbed the question, find the closest document chunks (vector search), grab the top few:\n\n``` js\nconst hits = await search(question, { k: 3 }); // the 3 most relevant chunks\n```\n\n(The retrieval half is its own topic — embeddings + a vector database. I built exactly that in TechFromZero Day 45 with Postgres + pgvector.)\n\nThis template is 80% of RAG quality:\n\n``` js\nconst prompt = `Answer using ONLY the context below.\nIf the answer isn't there, say \"I don't know.\"\n\nContext:\n${hits.map(h => \"- \" + h.text).join(\"\\n\")}\n\nQuestion: ${question}`;\n```\n\nThe words **\"ONLY the context\"** matter. Without them, the model blends its own (possibly wrong) memory back in. With them, it sticks to the source you gave it.\n\nSend that prompt to the LLM. Done. The answer is now grounded in *your* documents.\n\nHallucinations mostly happen when the context *doesn't* contain the answer but the model answers anyway. Two instructions turn a guesser into a librarian:\n\nThat's it. Retrieve → Augment → Generate. Pair this prompt half with a vector store (pgvector, Pinecone, Chroma...) and you've built \"chat with your docs.\"\n\n📎 Try the interactive RAG playground — watch retrieval + the prompt + the answer: [https://dev48v.infy.uk/prompt/day5-rag-basic.html](https://dev48v.infy.uk/prompt/day5-rag-basic.html)\n\nDay 5 of PromptFromZero. One prompting technique a day, explained for beginners.", "url": "https://wpnews.pro/news/rag-explained-retrieve-then-answer-the-prompt-that-kills-hallucinations", "canonical_source": "https://dev.to/dev48v/rag-explained-retrieve-then-answer-the-prompt-that-kills-hallucinations-2fdj", "published_at": "2026-06-13 22:54:13+00:00", "updated_at": "2026-06-13 23:30:56.556988+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "natural-language-processing", "ai-tools", "developer-tools"], "entities": ["RAG", "LLM", "pgvector", "Pinecone", "Chroma", "Postgres", "PromptFromZero", "TechFromZero"], "alternates": {"html": "https://wpnews.pro/news/rag-explained-retrieve-then-answer-the-prompt-that-kills-hallucinations", "markdown": "https://wpnews.pro/news/rag-explained-retrieve-then-answer-the-prompt-that-kills-hallucinations.md", "text": "https://wpnews.pro/news/rag-explained-retrieve-then-answer-the-prompt-that-kills-hallucinations.txt", "jsonld": "https://wpnews.pro/news/rag-explained-retrieve-then-answer-the-prompt-that-kills-hallucinations.jsonld"}}