Choosing the Right RAG Strategy A Complete Decision Guide to Chunking, Agentic RAG, and GraphRAG Poor performance in Retrieval-Augmented Generation (RAG) systems is typically caused by inadequate document chunking or mismatched retrieval architecture, not by the embedding model or LLM. It describes chunking as the process of dividing large documents into meaningful, self-contained units (like LEGO pieces) that preserve context and relationships, which is critical for effective retrieval and accurate answers. The guide covers various chunking strategies, advanced architectures like Agentic RAG and GraphRAG, and provides a decision framework to help users select the best combination for their specific use case. Introduction Here is a scenario many RAG builders know well, you wire up a pipeline, load your documents, ask a question and the answer is wrong, vague, or confidently hallucinated . The information was right there in your knowledge base. So what went wrong? In most cases the problem is not your embedding model . It is not your LLM . It is how you cut up your documents before storing them the under appreciated craft called chunking and whether the retrieval architecture you chose actually matches the complexity of your queries. This blog walks you through every major chunking strategy , explains how retrieval and augmentation work on top of those chunks, covers two advanced architectures Agentic RAG and GraphRAG and most importantly gives you a complete decision framework so you can walk away knowing exactly which combination fits your use case. 🐘 The Elephant & The LEGO Pieces Your document is an elephant. A 200+ pages of legal contract , a dense research paper , a massive product manual , or years of enterprise knowledge large, complex, interconnected, and full of valuable information. A Large Language Model cannot effectively consume the entire elephant at once because of: Context window limitations Retrieval precision constraints Latency considerations Token cost optimization Context dilution and retrieval noise So the elephant must be divided into smaller pieces. But this is where most RAG systems fail. If you cut the elephant randomly , you destroy meaning . Sentences lose context . Ideas become fragmented . Relationships disappear . Retrieval quality collapses. Good chunking is not about making text smaller. It is about preserving meaning while making retrieval efficient . That is why chunking is better understood as turning the elephant into LEGO pieces. LEGO pieces are: - Modular — each piece can stand on its own - Structured — pieces connect cleanly to related pieces - Consistent — standardized enough for reliable retrieval - Meaningful — each piece preserves semantic value - Composable — you assemble only the pieces needed for the task Good chunking works the same way. A well designed chunk should preserve structure , semantics , relationships , and surrounding context while remaining small enough for efficient retrieval and generation. The real goal of chunking in RAG systems is not simply splitting documents. Chunking is not simply about making documents smaller. The actual goals are: - Preserve semantic meaning - Improve retrieval precision - Reduce hallucinations - Optimize context windows - Improve grounding quality Balance latency and cost In practice: Better chunks lead to better retrieval, better prompts, and better answers. The goal is to retrieve: - the right piece, - with the right context, - from the right section, - at the right time. That is the foundation of effective Retrieval Augmented Generation RAG . The RAG Pipeline:End to End Every RAG system regardless of complexity follows the same four stage flow. Understanding each stage makes chunking and architecture decisions obvious rather than arbitrary. Stage 1: Document Your raw source material: PDFs, Word files, web pages, transcripts, database exports. Too large to pass directly to an LLM. Needs to be broken into chunks before it can be indexed or searched. Stage 2: Chunking and Embedding Documents are cut into units and each unit is converted into a vector embedding a numerical representation of its meaning. These embeddings are stored in a vector database and form your searchable index. Your chunking strategy here determines everything that follows. Stage 3: Retrieval When a user asks a question, the query is also embedded. The vector database returns the chunks whose embeddings are closest in meaning to the query. These are your retrieved LEGO pieces. Stage 4: Augmentation and Generation The retrieved chunks along with surrounding parent context are assembled into a prompt and sent to the LLM. The model generates an accurate, grounded answer from the material it receives. Core insight: The quality of your answer is bounded by retrieval quality, which is bounded by chunk quality. Better chunks → better retrieval → better answers. Every architectural decision downstream is built on this foundation. 1. Fixed-Size Chunking The simplest and most widely used strategy. Documents are split into equal sized blocks by token count, character count, or word count without regard for meaning, sentence boundaries, or document structure. LangChain Methods CharacterTextSplitter: splits on a single separator default \n\n , then enforces chunk size by character count. TokenTextSplitter: splits by token count using a tokenizer e.g. tiktoken for OpenAI models ; more accurate for LLM context budgets than character based splitting. python from langchain.text splitter import CharacterTextSplitter, TokenTextSplitter Character-based splitter = CharacterTextSplitter chunk size=1000, max characters per chunk chunk overlap=200, characters repeated at chunk boundaries separator="\n\n" Token-based splitter = TokenTextSplitter chunk size=512, max tokens per chunk chunk overlap=50 tokens repeated at chunk boundaries Overlap guidance: A 10–20% overlap is typical. For chunk size=1000, set chunk overlap between 100–200. Overlap reduces the risk of a relevant answer being split across two chunks, at the cost of minor redundancy. Strengths: Simple to implement, fast, predictable, easy to scale. Weaknesses: Frequently breaks sentences mid-way, degrading semantic continuity and retrieval quality on complex documents. Best for: Logs, telemetry, JSON, CSV, and other uniform structured content. 2. Recursive Chunking Rather than splitting blindly, recursive chunking respects natural document structure. It works down a priority list of separators — \n\n, then \n, then . / / ?, then spaces — only moving to a finer separator when a chunk still exceeds the size limit. This is the recommended default strategy in LangChain for most document types. LangChain Methods RecursiveCharacterTextSplitter: The primary implementation; tries each separator in the list before falling back to the next. RecursiveCharacterTextSplitter.from language : pre-configured separator lists for specific programming languages Python, JS, Markdown, HTML, etc. . python from langchain.text splitter import RecursiveCharacterTextSplitter, Language General prose splitter = RecursiveCharacterTextSplitter chunk size=1000, chunk overlap=150, separators= "\n\n", "\n", ".", " ", "?", " ", "" Language-aware e.g. Python source code splitter = RecursiveCharacterTextSplitter.from language language=Language.PYTHON, chunk size=1000, chunk overlap=100 Overlap guidance: 10–15% overlap works well for most prose. For code, keep overlap low 50–100 tokens to avoid duplicating function signatures across chunks. Strengths: Better semantic retention than fixed size chunking; good general-purpose strategy; improves retrieval coherence. Weaknesses: Structure aware rather than meaning aware; performance depends on document formatting quality. Best for: Documentation, PDFs, articles, knowledge bases, and web pages. 3. Semantic Chunking Instead of asking how large should the chunk be, semantic chunking asks which sentences belong together. Sentences are converted into vector embeddings, similarity is measured between adjacent sentences, and chunk boundaries are drawn where similarity drops below a threshold — indicating a topic transition. LangChain Methods SemanticChunker from langchain experimental — supports three breakpoint detection strategies: percentile, standard deviation, and interquartile. python from langchain experimental.text splitter import SemanticChunker from langchain openai import OpenAIEmbeddings splitter = SemanticChunker embeddings=OpenAIEmbeddings , breakpoint threshold type="percentile", or "standard deviation", "interquartile" breakpoint threshold amount=95 top 5% of similarity drops become boundaries Overlap guidance: Semantic chunking does not use a fixed chunk overlap boundaries are drawn on meaning, so overlapping would undermine the approach. If continuity is needed at boundaries, consider appending the last sentence of the previous chunk manually. Strengths: High retrieval relevance; strong semantic continuity; well-suited to precision-sensitive systems. Weaknesses: Computationally expensive; requires an embedding model at chunking time; similarity thresholds need tuning per dataset. Best for: Enterprise knowledge systems, research platforms, policy documents, and AI assistants requiring contextual precision. 4. Hierarchical Chunking Creates two levels of chunks: large parent chunks for context, and smaller child chunks for precision. Retrieval targets the child level to find relevant passages, then expands to the parent level to return surrounding context. This directly addresses the core RAG trade off: small chunks improve precision, large chunks preserve context. LangChain Methods ParentDocumentRetriever: stores parent chunks in a document store and child chunks in a vector store, then links them at retrieval time. python from langchain.retrievers import ParentDocumentRetriever from langchain.text splitter import RecursiveCharacterTextSplitter from langchain.storage import InMemoryStore from langchain community.vectorstores import Chroma parent splitter = RecursiveCharacterTextSplitter chunk size=2000 large context chunks child splitter = RecursiveCharacterTextSplitter chunk size=400 precise retrieval chunks retriever = ParentDocumentRetriever vectorstore=Chroma embedding function=embeddings , docstore=InMemoryStore , child splitter=child splitter, parent splitter=parent splitter Overlap guidance: Apply overlap only on the child splitter typically 10–15% . Parent chunks are retrieved wholesale for context, so overlap there adds noise rather than value. Strengths: Strong retrieval precision without sacrificing context; effective for long documents. Weaknesses: More complex to index and retrieve; requires additional storage and orchestration. Best for: Legal documents, technical manuals, books, enterprise documentation, and compliance systems. 5. Structure and Metadata Aware Chunking Uses the document's own structure titles , headers , sections , tables , and page layout as natural chunk boundaries rather than treating the document as plain text. Especially important for enterprise PDFs and structured reports, where layout carries semantic meaning that arbitrary splits would destroy. LangChain Methods MarkdownHeaderTextSplitter: splits on Markdown heading levels and attaches header text as metadata to each chunk. HTMLHeaderTextSplitter: same pattern for HTML documents, splitting on '