{"slug": "turning-a-generic-llm-into-a-ruby-expert-what-rag-fixed-and-what-it-didnt", "title": "Turning a Generic LLM into a Ruby Expert: What RAG Fixed and What It Didn’t", "summary": "A developer building a Retrieval-Augmented Generation (RAG) pipeline for Ruby documentation found that while the system improved answer accuracy, it did not eliminate hallucinations. The model still produced incorrect answers even when correct documentation was retrieved, revealing that RAG solves knowledge access but not reasoning problems. Failures fell into four categories: retrieval failure, chunking failure, context compression failure, and reasoning failure.", "body_md": "June 4, 2026\n\n*A practical look at hallucinations, retrieval, and why having the right documentation is not the same as understanding it.*\n\nOver the past few months, I’ve been experimenting with a simple question:\n\nCan a generic LLM become a Ruby expert simply by giving it access to Ruby documentation?\n\nThe answer is both **yes** and **no**.\n\nLike many developers exploring AI tooling, I built a Retrieval-Augmented Generation (RAG) pipeline using a local vector database and indexed Ruby documentation. The goal was straightforward: reduce hallucinations and improve technical accuracy when answering questions about Ruby libraries and APIs.\n\nThe results were fascinating.\n\nThe model improved dramatically.\n\nBut it didn’t stop hallucinating.\n\nInstead, the hallucinations evolved.\n\n### The Assumption Most Developers Make\n\nWhen first learning about RAG, many developers assume a workflow like this:\n\n```\nQuestion    ↓Retrieve relevant documentation    ↓Provide context to LLM    ↓Correct answer\n```\n\nIt feels logical.\n\nIf the correct information is present, the model should produce the correct answer.\n\nUnfortunately, reality is more complicated.\n\nA better representation is:\n\n```\nQuestion    ↓Retrieve relevant documentation    ↓Provide context to LLM    ↓Model interprets context    ↓Answer\n```\n\nThat extra step changes everything.\n\nThe model is still generating tokens probabilistically.\n\nDocumentation helps.\n\nDocumentation does not guarantee understanding.\n\n### My Ruby-LibGD Experiment\n\nWhile testing a knowledge base built around Ruby-LibGD documentation, I noticed something unexpected.\n\nMany answers improved immediately after indexing the project documentation.\n\nQuestions that previously generated completely fabricated API calls suddenly became accurate.\n\nHowever, a smaller category of failures remained.\n\nIn those cases:\n\n```\nQuestion    ↓Correct document retrieved    ↓Correct chunk retrieved    ↓Context provided    ↓Incorrect answer\n```\n\nAt first, I assumed the retrieval system was failing.\n\nIt wasn’t.\n\nThe retrieval pipeline was doing exactly what it was supposed to do.\n\nThe problem was happening after retrieval.\n\n### Retrieval Is Not Understanding\n\nThis is perhaps the most important lesson I learned.\n\nRAG solves a knowledge-access problem.\n\nIt does not solve a reasoning problem.\n\nImagine asking:\n\nWhat does this parameter represent?\n\nThe relevant documentation is retrieved.\n\nThe parameter description is present.\n\nThe model still has to interpret the text correctly.\n\nSometimes it succeeds.\n\nSometimes it combines that information with prior training data.\n\nSometimes it fills in gaps that don’t actually exist.\n\nThe retrieval system did its job.\n\nThe language model didn’t necessarily do its job.\n\n### Four Types of RAG Failures\n\nAfter enough testing, most failures seemed to fall into four categories.\n\n### 1. Retrieval Failure\n\nThe simplest case.\n\nThe correct document was never retrieved.\n\n```\nQuestion    ↓Wrong document    ↓Wrong answer\n```\n\nThis is what most people think of when discussing RAG quality.\n\nBetter embeddings, hybrid search, metadata filtering, and reranking can often improve this.\n\n### 2. Chunking Failure\n\nThe correct information exists.\n\nBut it is split across multiple chunks.\n\n```\nChunk A----------------Part of the explanationChunk B----------------Remaining explanation\n```\n\nThe answer requires both chunks.\n\nThe retriever only finds one.\n\nThe model then tries to complete the missing information.\n\nSometimes correctly.\n\nSometimes not.\n\n### 3. Context Compression Failure\n\nThis one surprised me.\n\nMany developers assume:\n\nMore context = better answers.\n\nNot always.\n\nIf you retrieve twenty partially relevant chunks, the important information can become diluted.\n\nThe answer may be buried inside a large amount of surrounding text.\n\nThe model sees everything.\n\nBut attention is not infinite.\n\n### 4. Reasoning Failure\n\nThe most interesting category.\n\nEverything works.\n\nThe correct document is found.\n\nThe correct chunk is found.\n\nThe relevant context is present.\n\nThe answer is still wrong.\n\n```\nCorrect retrieval    +Correct context    +Incorrect interpretation\n```\n\nThis is where many “RAG solved hallucinations” narratives start to break down.\n\n### The Hidden Conflict: Context vs Memory\n\nLarge language models have two knowledge sources.\n\n### Parametric Memory\n\nKnowledge learned during training.\n\n### Retrieved Context\n\nKnowledge provided at runtime.\n\nIdeally, retrieved context wins.\n\nIn practice, that doesn’t always happen.\n\nSuppose the model learned something years ago from training data.\n\nNow you provide newer documentation that says something slightly different.\n\nThe model must reconcile two competing sources of truth.\n\nSometimes it chooses correctly.\n\nSometimes it doesn’t.\n\nThis is one reason why hallucinations can survive even when documentation is available.\n\n### What RAG Actually Fixes\n\nAfter extensive testing, I no longer view RAG as a hallucination-removal system.\n\nI view it as a hallucination-reduction system.\n\nWithout RAG:\n\n```\nHallucination:100% invented\n```\n\nWith RAG:\n\n```\nHallucination:Partially grounded in real documentation\n```\n\nThat’s still a significant improvement.\n\nThe model becomes far more useful.\n\nThe error rate drops.\n\nAccuracy improves.\n\nBut the system remains probabilistic.\n\n### The Question We Should Be Asking\n\nMost discussions focus on:\n\nHow do we improve retrieval?\n\nThat’s important.\n\nBut I increasingly think the more interesting question is:\n\nHow do we detect when a model ignored correct evidence?\n\nThat leads to a completely different set of techniques:\n\n- Citation requirements\n- Answer verification\n- Self-evaluation\n- Groundedness checks\n- RAG evaluation frameworks\n- Secondary validation passes\n\nAt that point, you’re no longer building a chatbot.\n\nYou’re building a knowledge system.\n\n### Final Thoughts\n\nRAG is one of the most important techniques in modern AI engineering.\n\nIt dramatically improves accuracy.\n\nIt reduces hallucinations.\n\nIt allows models to work with private and domain-specific knowledge.\n\nBut one misconception continues to appear in blog posts, tutorials, and conference talks:\n\n“If the right context is present, the model will answer correctly.”\n\nMy experiments suggest otherwise.\n\nThe right context is necessary.\n\nIt is not sufficient.\n\nA language model can read the correct documentation and still produce the wrong answer.\n\nUnderstanding that distinction changed how I evaluate AI systems.\n\nAnd perhaps more importantly, it changed the question I ask when something goes wrong.\n\nInstead of asking:\n\nWhy didn’t the model find the answer?\n\nI now ask:\n\nWhy didn’t the model use the answer it already had?\n\n**Have you encountered similar behavior in your RAG systems? I’d love to hear your experiences, especially from teams building AI-powered tools in Ruby.**", "url": "https://wpnews.pro/news/turning-a-generic-llm-into-a-ruby-expert-what-rag-fixed-and-what-it-didnt", "canonical_source": "https://rubystacknews.com/2026/06/04/turning-a-generic-llm-into-a-ruby-expert-what-rag-fixed-and-what-it-didnt/", "published_at": "2026-06-05 02:35:35+00:00", "updated_at": "2026-06-18 11:55:31.688587+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-tools", "developer-tools"], "entities": ["Ruby", "Ruby-LibGD", "RAG", "LLM"], "alternates": {"html": "https://wpnews.pro/news/turning-a-generic-llm-into-a-ruby-expert-what-rag-fixed-and-what-it-didnt", "markdown": "https://wpnews.pro/news/turning-a-generic-llm-into-a-ruby-expert-what-rag-fixed-and-what-it-didnt.md", "text": "https://wpnews.pro/news/turning-a-generic-llm-into-a-ruby-expert-what-rag-fixed-and-what-it-didnt.txt", "jsonld": "https://wpnews.pro/news/turning-a-generic-llm-into-a-ruby-expert-what-rag-fixed-and-what-it-didnt.jsonld"}}