{"slug": "your-rag-stack-is-solving-the-2023-problem", "title": "Your RAG Stack Is Solving the 2023 Problem", "summary": "A developer argues that production RAG systems must move beyond basic top-k retrieval to handle routing, memory, evidence checks, structured retrieval, and security. They outline a taxonomy of 20 advanced RAG types and emphasize that mature systems treat retrieval as a sequence of choices rather than a single pipeline.", "body_md": "**Top-k retrieval was the beginning. Production systems now need routing, memory, evidence checks, structured retrieval, and security around the retrieval layer.**\n\nMost RAG tutorials still start with the same pipeline:\n\n```\ndocuments → chunks → embeddings → vector database → top-k retrieval → LLM answer\n```\n\nThis was the right starting point. It made LLMs useful with private data, reduced some hallucinations, and gave developers a simple way to connect models to documents.\n\nWith real applications everything is a bit more complicated.\n\nThe answer may be scattered across twelve pages. The relevant source may be a table, a PDF diagram, a spreadsheet, a ticket thread, or a database row. The user’s question may require several retrieval steps. A semantically similar chunk may be relevant without actually proving the answer. The corpus may contain untrusted text that tries to steer the model.\n\nAt that point, the question is no longer “Do we have RAG?”\n\nThe better question is:\n\n**What kind of retrieval problem are we solving?**\n\nI wrote a longer taxonomy at Turing Post, [20 Advanced RAG Types to Know in 2026](https://www.turingpost.com/p/ragtypes). This post is the short developer version: the part you should think through before building another “upload docs and chat” system.\n\nThe classic RAG pipeline works well in a well-structured world.\n\nIt assumes the answer lives in one or a few text chunks. It assumes semantic similarity is close enough to evidence. It assumes one retrieval pass is enough. It assumes the retrieved context is safe to pass into the model. It assumes the user’s question is clear and the corpus is stable.\n\nSometimes those assumptions hold. A support FAQ, a small documentation set, or a clean internal knowledge base can work well with basic vector search plus reranking.\n\nBut many systems break these assumptions quickly.\n\nA legal assistant may need to connect definitions across a contract. A research assistant may need to understand a full paper, not one paragraph. A finance assistant may need structured numbers from tables. A customer support agent may need to check policy, account status, previous tickets, and current product behavior. A coding assistant may need repo structure, conventions, issue history, and a changing local state. etc etc\n\nIn all these cases, “retrieve top-k chunks” becomes a habit, not an architecture.\n\nA more mature RAG system treats retrieval as a sequence of choices.\n\n```\nuser query\n    ↓\nshould we retrieve?\n    ↓\nwhat source should we use?\n    ↓\nwhat retrieval method fits this source?\n    ↓\nis the evidence enough?\n    ↓\nshould we retrieve again?\n    ↓\nanswer, refuse, ask, or escalate\n```\n\nSeems small when seen like this? Maybe, but in production, it changes almost everything.\n\nThe system now has to decide whether the user’s question needs retrieval at all. It has to choose between sources: documentation, database, logs, tickets, long-term memory, web search, or internal APIs. It has to decide whether the answer requires one pass, multiple passes, a graph traversal, a table lookup, or a verification step.\n\nThis is why “RAG” has become a family of patterns.\n\nWhen the answer is one place – basic RAG works great.\n\nReal documents do not always behave like that. Contracts define terms in one section and apply them later. Research papers introduce assumptions early and results much later. Internal strategy docs contain scattered decisions, caveats, and exceptions.\n\nWhen the answer depends on document-level structure, the system needs more than similarity search. It may need long-context retrieval, hierarchical chunking, section-aware retrieval, summary layers, or memory over previous retrieval steps.\n\nA simple symptom: the model keeps giving plausible partial answers because every retrieved chunk is locally relevant, while the actual answer requires a broader reading.\n\nA lot of RAG systems retrieve every time because the pipeline says so.\n\nThat creates noise. Some questions can be answered from the model’s general knowledge. Some require precise internal data. Some require asking the user a clarifying question before retrieval. Some require several retrieval rounds because the first answer reveals what must be checked next.\n\nThis is where adaptive and agentic RAG patterns will shine.\n\nThe system does not treat retrieval as an automatic reflex. It treats retrieval as an action. The model, router, or controller decides when to retrieve, where to retrieve from, and whether the result is good enough to continue.\n\nFor developers, this usually means the retrieval layer needs policy, and not just plumbing.\n\n```\nif query asks about internal policy:\n    retrieve from policy docs\n\nif query asks about account-specific status:\n    call account API\n\nif query is ambiguous:\n    ask a clarifying question\n\nif first retrieval has weak evidence:\n    retrieve again or escalate\n```\n\nThis sounds obvious. But again, many production failures come from skipping exactly this step.\n\nVector search is good at finding text that resembles the query. That is useful, but resemblance is not proof.\n\nA retrieved paragraph can be on the same topic and still fail to support the answer. It can mention the right entity while saying nothing about the user’s actual question. It can be outdated, or contradict another source. Or it can be a summary of a policy instead of the policy itself.\n\nThis is where failure creeps into many RAG systems: the answer looks grounded because there are citations, but the citations do not actually carry the claim.\n\nVerification-oriented RAG adds another step. After retrieval, the system checks whether the evidence is sufficient, whether it contradicts other evidence, and whether the answer should be narrower.\n\nA useful mental model:\n\n```\nretrieval asks: “What looks relevant?”\nverification asks: “What can we safely say from this?”\n```\n\nThe second question is where many serious applications begin.\n\nThe easy version of RAG assumes documents become chunks of text.\n\nBut the world, which includes modern corpora, is messier. They contain tables, charts, screenshots, slides, diagrams, invoices, code, transcripts, forms, logs, and database records. Flattening all of that into plain text can destroy the structure that made the source useful.\n\nA spreadsheet is not just a sequence of words. A table has rows, columns, headers, units, and relationships. A diagram can encode a process. A codebase has imports, call graphs, tests, comments, issues, and conventions.\n\nWhen the source has structure, retrieval should preserve as much of that structure as possible.\n\nThat may mean multimodal retrieval, table-aware retrieval, graph-based retrieval, SQL generation, code search, metadata filters, or hybrid search that combines lexical, semantic, and structural signals.\n\nThis is less about building a fancier system and more about acknowledging that knowledge doesn't always come packaged as paragraphs.\n\nRAG systems often treat retrieved context as trusted context.\n\nThat is a dangerous shortcut.\n\nIf the corpus includes user-generated content, external web pages, third-party documentation, support tickets, shared docs, or any source that can be edited by someone outside your control, retrieval becomes a security boundary.\n\nThe model does not naturally know which retrieved text is evidence and which retrieved text is an instruction. Developers have to make that boundary explicit.\n\nAt minimum, retrieval-aware systems need:\n\nSecurity has to be woven into the retrieval process.\n\nBefore building another RAG stack, ask this:\n\n**What job is retrieval doing in this system?**\n\nThat one question is more useful than picking a vector database too early.\n\nMaybe retrieval is there to fetch facts. Maybe it is there to maintain memory. Maybe it is there to support reasoning across documents. Maybe it is there to verify claims. Maybe it is there to search structured data. Maybe it is there to ground an agent before it takes action.\n\nEach job leads to a different architecture.\n\nFor a simple documentation bot, the old pipeline may be enough:\n\n```\nquery → vector search → rerank → answer with citations\n```\n\nFor a customer support assistant, you may need:\n\n```\nquery → classify intent → retrieve policy → check account state → draft response → verify against policy → human review\n```\n\nFor a research assistant, you may need:\n\n```\nquery → decompose question → retrieve sources → compare evidence → identify gaps → retrieve again → synthesize with citations\n```\n\nFor an enterprise agent, you may need:\n\n```\nquery → permission check → source routing → retrieval → tool call → evidence check → action proposal → approval gate\n```\n\nThese are all called RAG in casual conversation. They are not the same system.\n\nWhen a RAG system starts failing, do not immediately tune embeddings or change chunk size. Those can help, but they are often local fixes for a deeper design issue.\n\nStart with these questions:\n\n**Where should retrieval happen?**\n\nBefore the model answers, during a multi-step process, after the model forms a plan, or only when confidence is low?\n\n**What source should the system trust?**\n\nInternal documentation, user files, live APIs, external web pages, databases, logs, tickets, or memory?\n\n**What shape is the source?**\n\nPlain text, long documents, tables, code, images, transcripts, structured records, or mixed media?\n\n**What does “good evidence” mean?**\n\nA relevant paragraph, an exact quote, a database value, a policy match, a calculation, or agreement across several sources?\n\n**What should happen when evidence is weak?**\n\nRetrieve again, ask the user, answer narrowly, refuse, or escalate?\n\n**Can retrieved content change behavior?**\n\nIf yes, you need stronger boundaries between evidence and instructions.\n\nThis is where RAG becomes the real engineering.\n\nThe early RAG story was simple: give the model external knowledge.\n\nThe production story is more complicated: give the system the right context, from the right source, at the right time, with the right permissions, and with enough evidence to justify the answer.\n\nThat is why the field is spreading into many patterns: long-document RAG, agentic RAG, adaptive RAG, corrective RAG, self-reflective RAG, graph RAG, multimodal RAG, structured RAG, federated RAG, secure RAG, and more.\n\nSome of these names will survive. Some will be renamed. Some will be absorbed into ordinary application architecture. The naming is less important than the underlying shift.\n\nRAG is becoming the context layer for AI systems.\n\nAnd context is no longer just “some chunks in the prompt.”\n\nFor the full map, I put together a deeper taxonomy at Turing Post: [20 Advanced RAG Types to Know in 2026](https://www.turingpost.com/p/ragtypes). It groups the patterns by the problems they solve, from long-document memory and adaptive retrieval to verification, multimodal sources, graph reasoning, federated retrieval, and retrieval-layer security.\n\nRead that before you rebuild your pipeline for the fifth time. Your vector database has suffered enough.", "url": "https://wpnews.pro/news/your-rag-stack-is-solving-the-2023-problem", "canonical_source": "https://dev.to/kseniase/your-rag-stack-is-solving-the-2023-problem-53m8", "published_at": "2026-06-16 18:06:08+00:00", "updated_at": "2026-06-16 18:18:07.941882+00:00", "lang": "en", "topics": ["large-language-models", "ai-products", "ai-infrastructure", "developer-tools", "ai-agents"], "entities": ["Turing Post"], "alternates": {"html": "https://wpnews.pro/news/your-rag-stack-is-solving-the-2023-problem", "markdown": "https://wpnews.pro/news/your-rag-stack-is-solving-the-2023-problem.md", "text": "https://wpnews.pro/news/your-rag-stack-is-solving-the-2023-problem.txt", "jsonld": "https://wpnews.pro/news/your-rag-stack-is-solving-the-2023-problem.jsonld"}}