cd /news/large-language-models/rag-vs-agentic-ai-a-developer-s-deci… · home topics large-language-models article
[ARTICLE · art-38404] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

RAG vs Agentic AI: A Developer's Decision Tree (With Code Examples for Both)

A developer provides a decision tree and code examples to distinguish between RAG (Retrieval-Augmented Generation) and agentic AI architectures. RAG is recommended for answering questions from documents, while agents are suited for taking actions across multiple systems. The post includes working Python code for both approaches using LangChain and Anthropic's Claude.

read5 min views1 publishedJun 24, 2026

Two different problems wearing similar clothes. Here's how to tell them apart in thirty seconds, with working code for both.

I see this confusion in almost every project kickoff: "We need RAG" when the actual requirement is agentic, or "we need an agent" when RAG would be simpler, cheaper, and faster to ship.

Let's fix that with a decision tree you can actually use, plus working code for each path.

Does your system need to ANSWER QUESTIONS from documents?
├── YES, and that's the whole job → RAG
└── YES, but it also needs to TAKE ACTIONS across systems
    └── → Agent that uses RAG as a tool

Does your system need to TAKE ACTIONS across multiple systems?
├── YES, with no document retrieval needed → Plain Agent
└── YES, and it needs grounded knowledge from documents → 
    → Agent that uses RAG as a tool

The test question that resolves most confusion: "Does this system need to decide what to do, or does it need to find and synthesise information?" Finding and synthesising → RAG. Deciding and acting → agent.

RAG is the right architecture when your job is grounding LLM responses in a specific document set, answering questions, summarising content, finding relevant passages.

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_anthropic import ChatAnthropic

splitter = RecursiveCharacterTextSplitter(
    chunk_size=800, 
    chunk_overlap=100
)
chunks = splitter.split_documents(documents)

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)
vectorstore = Chroma.from_documents(chunks, embeddings)

llm = ChatAnthropic(model="claude-sonnet-4-5")
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    return_source_documents=True
)

result = qa_chain({"query": "What is our refund policy for enterprise customers?"})
print(result["result"])
print(result["source_documents"])  # Always show sources

This is the whole job: retrieve relevant chunks, ground the LLM's answer in them, return a response with citations. No planning loop, no tool orchestration, no multi-step decision-making. If your use case stops here, building agent infrastructure on top of this is unnecessary complexity.

An agent is right when the job is taking actions, checking systems, executing operations, making decisions that span multiple steps and there's no document knowledge base involved.

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "check_inventory",
        "description": "Check current stock level for a SKU",
        "input_schema": {
            "type": "object",
            "properties": {"sku": {"type": "string"}},
            "required": ["sku"]
        }
    },
    {
        "name": "create_purchase_order",
        "description": "Create a PO with a supplier",
        "input_schema": {
            "type": "object",
            "properties": {
                "supplier_id": {"type": "string"},
                "sku": {"type": "string"},
                "quantity": {"type": "integer"}
            },
            "required": ["supplier_id", "sku", "quantity"]
        }
    }
]

def run_inventory_agent(goal: str) -> str:
    messages = [{"role": "user", "content": goal}]

    for _ in range(6):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1500,
            tools=tools,
            messages=messages
        )

        if response.stop_reason == "end_turn":
            return next(b.text for b in response.content if hasattr(b, 'text'))

        messages.append({"role": "assistant", "content": response.content})
        tool_results = []

        for block in response.content:
            if block.type == "tool_use":
                result = execute_inventory_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result
                })

        messages.append({"role": "user", "content": tool_results})

    return "Reached max iterations."

run_inventory_agent(
    "Check stock for SKU-4471. If below 50 units, "
    "create a PO with our primary supplier for 200 units."
)

No documents involved. The agent checks inventory, reasons about the threshold, and conditionally creates a purchase order. This is pure action orchestration.

This is where most real enterprise systems actually land: an agent that needs to take actions, and one of the things it needs to do along the way is look something up in a document knowledge base.

import anthropic

client = anthropic.Anthropic()

def rag_lookup(query: str) -> str:
    """RAG retrieval wrapped as a tool the agent can call."""
    result = qa_chain({"query": query})  # the RAG chain from Path 1
    return json.dumps({
        "answer": result["result"],
        "sources": [doc.metadata.get("source") for doc in result["source_documents"]]
    })

tools = [
    {
        "name": "search_policy_documents",
        "description": "Search company policy documents for relevant information",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    },
    {
        "name": "issue_refund",
        "description": "Process a refund for a customer order",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string"},
                "amount": {"type": "number"}
            },
            "required": ["order_id", "amount"]
        }
    }
]

def execute_tool(name: str, input_data: dict) -> str:
    if name == "search_policy_documents":
        return rag_lookup(input_data["query"])
    elif name == "issue_refund":
        return process_refund(input_data["order_id"], input_data["amount"])

def run_refund_agent(customer_request: str) -> str:
    messages = [{"role": "user", "content": customer_request}]

    for _ in range(6):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1500,
            tools=tools,
            messages=messages
        )

        if response.stop_reason == "end_turn":
            return next(b.text for b in response.content if hasattr(b, 'text'))

        messages.append({"role": "assistant", "content": response.content})
        tool_results = [
            {"type": "tool_result", "tool_use_id": block.id,
             "content": execute_tool(block.name, block.input)}
            for block in response.content if block.type == "tool_use"
        ]
        messages.append({"role": "user", "content": tool_results})

    return "Reached max iterations."

run_refund_agent(
    "Customer wants a refund on order #8821 for $340. "
    "Check our refund policy first to see if this qualifies."
)

The agent decides to call search_policy_documents

to check eligibility before deciding whether to call issue_refund

. The RAG system is doing exactly what it's good at, grounded retrieval, but it's a tool in service of the agent's broader decision-making, not the entire system.

RAG-only systems are cheaper to build and run. Single retrieval call, single generation call, predictable latency, easier to evaluate (you can measure retrieval precision and answer accuracy independently).

Agentic systems are more expensive and harder to debug. Multiple LLM calls per task, unpredictable latency (depends how many iterations the agent takes), harder to evaluate because failure can happen at the planning stage or the execution stage. They're also the only option when the task genuinely requires multi-step action across systems.

The mistake we see most often: teams building agentic infrastructure for what's fundamentally a question-answering problem, paying the complexity cost for capability they don't need.

The full ** RAG vs agentic AI** comparison covers the cost modelling, latency benchmarks, and evaluation methodology differences in more depth.

Once you've picked your architecture, the next question is build vs buy, do you build this RAG pipeline or agent loop yourself, or do you use a managed platform? The answer depends on your timeline, your team's capacity, and how differentiated your specific use case actually is. We wrote the framework with cost models, time estimates, and decision criteria for exactly this question, worth reading ** before you commit engineering time** to either path.

Published by Dextra Labs | AI Consulting & Enterprise Agent Development

── more in #large-language-models 4 stories · sorted by recency
── more on @langchain 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/rag-vs-agentic-ai-a-…] indexed:0 read:5min 2026-06-24 ·