{"slug": "building-a-langgraph-rag-agent-from-scratch-with-a-live-ui-that-shows-every-step", "title": "Building a LangGraph RAG Agent from Scratch — with a Live UI That Shows Every Step", "summary": "A developer built a progressive learning project that teaches LangChain and LangGraph by walking through six steps, from a raw LLM call to a full ReAct agent with retrieval-augmented generation (RAG). The final agent answers questions about rate limiting algorithms and streams its execution over SSE to a React UI that visualizes every node in the agent loop in real time. The project includes a FastAPI backend and a React frontend, with each step introducing a new concept such as prompt templates, tool binding, LangGraph state machines, and FAISS-based retrieval.", "body_md": "I built a learning project that teaches LangChain and LangGraph step by step — starting from a raw LLM call and ending with a full ReAct agent backed by RAG, streamed over SSE to a React UI that **visualises every node in the agent loop in real time**.\n\nThis post walks through the whole thing: what each concept does, how it connects to the next, and how the live pipeline view works.\n\n```\nfrontend/   ← React + Vite chat UI (live agent loop visualisation)\nbackend/    ← FastAPI server wrapping the RAG agent\nstep*.py    ← 6 progressive learning files\n```\n\nThe agent answers questions about rate limiting algorithms. That's just the domain — the real goal is to understand **how LangChain and LangGraph fit together**.\n\n| File | Concept introduced |\n|---|---|\n`step1_llm_basics.py` |\nChat models, messages, `.invoke()` , statelessness |\n`step2_prompts_and_chains.py` |\nPrompt templates, LCEL `\\ |\n{% raw %}`step3_tools.py`\n|\n`@tool` decorator, `bind_tools()` , manual tool loop |\n`step4_langgraph_intro.py` |\n`StateGraph` , nodes, edges, conditional routing |\n`step5_full_agent.py` |\nFull ReAct loop with `ToolNode`\n|\n`step6_rag_agent.py` |\nRAG — FAISS, HuggingFace embeddings, retriever tool |\n\nThe simplest possible thing: call a model and read the reply.\n\n``` python\nfrom langchain_groq import ChatGroq\nfrom langchain_core.messages import SystemMessage, HumanMessage\n\nllm = ChatGroq(model=\"llama-3.3-70b-versatile\")\n\nmessages = [\n    SystemMessage(content=\"You are a rate limiting expert.\"),\n    HumanMessage(content=\"What is token bucket?\"),\n]\n\nresponse = llm.invoke(messages)\nprint(response.content)\n```\n\n**Key insight:** The LLM is stateless. Every call is independent. You manage the conversation history yourself by passing the full message list each time.\n\nLangChain Expression Language (LCEL) lets you compose components with the `|`\n\npipe operator — the same way Unix pipes work.\n\n``` python\nfrom langchain_core.prompts import ChatPromptTemplate\n\nprompt = ChatPromptTemplate.from_messages([\n    (\"system\", \"You are a rate limiting expert.\"),\n    (\"human\", \"{question}\"),\n])\n\n# Chain: prompt → LLM\nchain = prompt | llm\n\n# Invoke\nresponse = chain.invoke({\"question\": \"Compare token bucket and leaky bucket\"})\n\n# Stream tokens as they arrive\nfor chunk in chain.stream({\"question\": \"What is sliding window log?\"}):\n    print(chunk.content, end=\"\", flush=True)\n```\n\n**Key insight:** LCEL chains are lazy. `.stream()`\n\nand `.batch()`\n\nare first-class — no extra code needed.\n\nTools let the LLM take actions. The `@tool`\n\ndecorator turns a Python function into something the model can call.\n\n``` python\nfrom langchain_core.tools import tool\nfrom langchain_groq import ChatGroq\n\n@tool\ndef get_algorithm_info(algorithm: str) -> str:\n    \"\"\"Return a brief description of a rate limiting algorithm.\"\"\"\n    descriptions = {\n        \"token_bucket\":    \"Tokens refill at a fixed rate up to a capacity cap. Allows bursts.\",\n        \"fixed_window\":    \"Counts requests in fixed time windows. Simple but has boundary spikes.\",\n        \"sliding_window\":  \"Precise per-request log. High memory, no boundary spikes.\",\n        \"leaky_bucket\":    \"Queue drains at a constant rate. Smooths traffic, no bursts allowed.\",\n    }\n    return descriptions.get(algorithm, \"Unknown algorithm.\")\n\n# Bind tools to the model — it now knows what tools exist and their signatures\nllm_with_tools = ChatGroq(model=\"meta-llama/llama-4-scout-17b-16e-instruct\").bind_tools(\n    [get_algorithm_info]\n)\n\nresponse = llm_with_tools.invoke(\"Tell me about token bucket\")\n# response.tool_calls → [{\"name\": \"get_algorithm_info\", \"args\": {\"algorithm\": \"token_bucket\"}}]\n```\n\n**Key insight:** `bind_tools()`\n\nsends the tool schemas to the model. The model returns a structured `tool_calls`\n\nlist — it does not execute the tools itself. *You* run them and send the results back.\n\nLangGraph models the agent as a **state machine**. You define:\n\n``` python\nfrom langgraph.graph import StateGraph, END\nfrom langgraph.graph.message import add_messages\nfrom typing import Annotated\nfrom typing_extensions import TypedDict\n\nclass State(TypedDict):\n    messages: Annotated[list, add_messages]  # reducer: appends, never replaces\n\ndef node_a(state: State):\n    return {\"messages\": [\"Hello from node A\"]}\n\ndef node_b(state: State):\n    return {\"messages\": [\"Hello from node B\"]}\n\ndef route(state: State):\n    return \"b\" if len(state[\"messages\"]) < 3 else END\n\ngraph = StateGraph(State)\ngraph.add_node(\"a\", node_a)\ngraph.add_node(\"b\", node_b)\ngraph.set_entry_point(\"a\")\ngraph.add_conditional_edges(\"a\", route, {\"b\": \"b\", END: END})\ngraph.add_edge(\"b\", \"a\")\n\napp = graph.compile()\n```\n\n**Key insight:** `add_messages`\n\nis a **reducer**. When a node returns `{\"messages\": [new_msg]}`\n\n, LangGraph appends it to the list instead of replacing it. This is how the conversation history accumulates automatically.\n\nThe ReAct pattern (Reason + Act) is: LLM decides what to do → tools execute it → LLM sees the result → repeat.\n\nLangGraph's `ToolNode`\n\nhandles the execution side automatically.\n\n``` python\nfrom langgraph.graph import StateGraph, END\nfrom langgraph.graph.message import add_messages\nfrom langgraph.prebuilt import ToolNode\nfrom langchain_groq import ChatGroq\nfrom langchain_core.messages import HumanMessage\nfrom typing import Annotated\nfrom typing_extensions import TypedDict\n\ntools = [get_algorithm_info, recommend_algorithm, calculate_token_bucket]\nllm   = ChatGroq(model=\"meta-llama/llama-4-scout-17b-16e-instruct\").bind_tools(tools)\n\nclass State(TypedDict):\n    messages: Annotated[list, add_messages]\n\ndef llm_node(state: State):\n    return {\"messages\": [llm.invoke(state[\"messages\"])]}\n\ndef tools_condition(state: State):\n    return \"tools\" if state[\"messages\"][-1].tool_calls else END\n\ngraph = StateGraph(State)\ngraph.add_node(\"llm\",   llm_node)\ngraph.add_node(\"tools\", ToolNode(tools))\ngraph.set_entry_point(\"llm\")\ngraph.add_conditional_edges(\"llm\", tools_condition)\ngraph.add_edge(\"tools\", \"llm\")  # always loop back after tool execution\n\nagent = graph.compile()\n\nresult = agent.invoke({\"messages\": [HumanMessage(content=\"What algorithm for bursty traffic?\")]})\nprint(result[\"messages\"][-1].content)\n```\n\n**The loop:**\n\n```\nSTART → [llm] → has tool_calls? → YES → [tools] → back to [llm]\n                                → NO  → END\n```\n\nRetrieval-Augmented Generation (RAG) gives the agent long-form knowledge from documents. We embed documents into a FAISS vector store and expose it as a tool.\n\n``` python\nfrom langchain_huggingface import HuggingFaceEmbeddings\nfrom langchain_community.vectorstores import FAISS\nfrom langchain_text_splitters import RecursiveCharacterTextSplitter\nfrom langchain_core.tools import tool\n\n# Index documents once at startup\nembeddings = HuggingFaceEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n\ndocs = load_knowledge_base()           # returns list of Document objects\nsplitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)\nchunks = splitter.split_documents(docs)\n\nvectorstore = FAISS.from_documents(chunks, embeddings)\nretriever   = vectorstore.as_retriever(search_kwargs={\"k\": 3})\n\n# Expose retrieval as a tool\n@tool\ndef search_knowledge_base(query: str) -> str:\n    \"\"\"Search the rate limiting knowledge base for relevant information.\"\"\"\n    docs = retriever.invoke(query)\n    return \"\\n---\\n\".join(d.page_content for d in docs)\n```\n\n**Key insight:** RAG is just a tool from the agent's perspective. The LLM decides *when* to call it based on the question. The retriever converts the query to an embedding, finds the nearest chunks in FAISS, and returns them as context.\n\nThe backend wraps the agent in a FastAPI server. The interesting part is the streaming endpoint, which uses `agent.astream_events()`\n\n— a granular async generator that fires events for every internal state change in the graph.\n\n``` python\nfrom fastapi.responses import StreamingResponse\nfrom langchain_core.messages import HumanMessage\nimport json\n\n@app.post(\"/chat/stream\")\nasync def chat_stream(request: ChatRequest):\n    async def generate():\n        llm_call_count = 0\n        graph_started  = False\n\n        async for event in agent.astream_events(\n            {\"messages\": [HumanMessage(content=request.message)]},\n            version=\"v2\",\n        ):\n            kind = event[\"event\"]\n            node = event.get(\"metadata\", {}).get(\"langgraph_node\", \"\")\n\n            # LLM node starting\n            if kind == \"on_chat_model_start\" and node == \"llm\":\n                if not graph_started:\n                    graph_started = True\n                    yield sse({\"type\": \"pipeline\", \"phase\": \"graph_start\"})\n                llm_call_count += 1\n                yield sse({\"type\": \"pipeline\", \"phase\": \"llm_start\", \"call\": llm_call_count})\n\n            # LLM done — emit routing decision\n            elif kind == \"on_chat_model_end\" and node == \"llm\":\n                output     = event[\"data\"].get(\"output\")\n                tool_calls = getattr(output, \"tool_calls\", []) if output else []\n                yield sse({\n                    \"type\":       \"pipeline\",\n                    \"phase\":      \"llm_end\",\n                    \"decision\":   \"tools\" if tool_calls else \"answer\",\n                    \"tool_names\": [tc[\"name\"] for tc in tool_calls],\n                })\n\n            # Tool executing\n            elif kind == \"on_tool_start\":\n                yield sse({\"type\": \"pipeline\", \"phase\": \"tool_start\",\n                           \"tool\": event[\"name\"], \"args\": event[\"data\"].get(\"input\", {})})\n\n            # Tool done\n            elif kind == \"on_tool_end\":\n                out     = event[\"data\"].get(\"output\", \"\")\n                content = out.content if hasattr(out, \"content\") else str(out)\n                yield sse({\"type\": \"pipeline\", \"phase\": \"tool_end\",\n                           \"tool\": event[\"name\"], \"preview\": content[:120]})\n\n            # Individual LLM output tokens (final answer only)\n            elif kind == \"on_chat_model_stream\" and node == \"llm\":\n                chunk = event[\"data\"][\"chunk\"]\n                if chunk.content and not getattr(chunk, \"tool_call_chunks\", []):\n                    yield sse({\"type\": \"token\", \"content\": chunk.content})\n\n        yield sse({\"type\": \"pipeline\", \"phase\": \"graph_end\"})\n        yield \"data: [DONE]\\n\\n\"\n\n    return StreamingResponse(generate(), media_type=\"text/event-stream\")\n```\n\n**Why astream_events instead of astream?**\n\n`astream()`\n\ngives you one event per *node* that completes — coarse-grained. `astream_events(version=\"v2\")`\n\nfires for every internal lifecycle hook: model start/stream/end, tool start/end, chain start/end. This is what lets us show individual tokens and the routing decision in real time.\n\nEvery assistant response shows a collapsible **Agent Loop** panel. Each node card appears and updates live as the corresponding event arrives from the SSE stream.\n\n```\n🚀 StateGraph Initialized          [langgraph]\n   StateGraph.compile() · add_messages reducer\n   ↓\n🧠 LLM Node — Call #1  ⟳           [langchain]   ← spinning while active\n   ChatGroq(llama-4-scout) · bind_tools(4)\n   AIMessage has tool_calls → selected: search_knowledge_base\n   ↓\n◆  Conditional Edge → tools node   [langgraph]\n   add_conditional_edges · tools_condition(state)\n   has tool_calls → route to tools\n   ↓\n🔍 ToolNode: search_knowledge_base ⟳ [langchain]\n   FAISS vector search · HuggingFace embeddings\n   query: HTTP headers rate limiting\n   → Retrieved 3 relevant chunk(s)\n   ↓\n🧠 LLM Node — Call #2  ✓           [langchain]\n   LLM sees ToolMessage in state\n   no tool_calls → generating final answer\n   ↓\n◆  Conditional Edge → END          [langgraph]\n   no tool_calls → route to END\n   ↓\n🏁 Graph END                       [langgraph]\n   messages[-1].content → response\n```\n\nNodes are **colour-coded**:\n\nBadges identify which framework is responsible: `langgraph`\n\n(purple) vs `langchain`\n\n(orange).\n\nTokens from the LLM arrive in bursts over SSE. Rather than applying them immediately, a character queue drains at a fixed pace (18ms/char) so the text types out at a readable speed:\n\n``` js\nconst CHAR_DELAY = 18  // ms per character\n\n// When a token event arrives, push each character into the queue\nif (ev.type === 'token') {\n  tokenQueue.current.push(...ev.content.split(''))\n  startTicker(assistantId)\n}\n\n// Ticker drains one char at a time\nconst startTicker = (id) => {\n  tickerRef.current = setInterval(() => {\n    if (!tokenQueue.current.length) return\n    const ch = tokenQueue.current.shift()\n    setMessages(prev => prev.map(m =>\n      m.id === id ? { ...m, content: (m.content || '') + ch } : m\n    ))\n  }, CHAR_DELAY)\n}\n```\n\n| Layer | Technology |\n|---|---|\n| LLM | Groq — `llama-4-scout-17b` (tool calling), `llama-3.3-70b` (text) |\n| Agent framework | LangGraph — `StateGraph` , `ToolNode` , `add_conditional_edges`\n|\n| RAG | LangChain + HuggingFace `all-MiniLM-L6-v2` embeddings + FAISS |\n| Streaming |\n`astream_events(version=\"v2\")` → Server-Sent Events |\n| Backend | FastAPI + uvicorn |\n| Frontend | React 18 + Vite + react-markdown |\n\n```\n# Python deps (uses uv to avoid system Python issues)\nuv venv .venv --python 3.12\nuv pip install -r requirements.txt\n\n# Frontend deps\ncd frontend && npm install && cd ..\n\n# Terminal 1 — backend\ncd backend\nGROQ_API_KEY=your_key uvicorn main:app --port 8000 --reload\n\n# Terminal 2 — frontend\ncd frontend && npm run dev\n```\n\nOpen ** http://localhost:5173**. The first run downloads the embedding model (~90 MB) and caches it.\n\n**LangChain** gives you the building blocks: models, prompt templates, tools, LCEL chains, vector stores.\n\n**LangGraph** gives you the control flow: a state machine where you decide the loop, the branching, and when to stop.\n\nThe two fit together naturally — LangGraph nodes call LangChain components, and LangChain tools feed results back into LangGraph state via `add_messages`\n\n.\n\nThe most clarifying thing was building the UI that shows the loop. When you watch the graph execute in real time — LLM node lights up, routing decision fires, ToolNode spins, LLM node fires again — the ReAct pattern stops being abstract and becomes something you can see.\n\n*The full source is on GitHub. The step*.py files are designed to be read in order — each one is self-contained and introduces exactly one new concept.*", "url": "https://wpnews.pro/news/building-a-langgraph-rag-agent-from-scratch-with-a-live-ui-that-shows-every-step", "canonical_source": "https://dev.to/ameya_joshi_68fa01c3a1a16/building-a-langgraph-rag-agent-from-scratch-with-a-live-ui-that-shows-every-step-4nle", "published_at": "2026-06-06 20:06:02+00:00", "updated_at": "2026-06-06 20:11:18.590502+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "ai-agents", "ai-tools", "natural-language-processing"], "entities": ["LangGraph", "LangChain", "React", "Vite", "FastAPI", "FAISS", "HuggingFace", "ChatGroq"], "alternates": {"html": "https://wpnews.pro/news/building-a-langgraph-rag-agent-from-scratch-with-a-live-ui-that-shows-every-step", "markdown": "https://wpnews.pro/news/building-a-langgraph-rag-agent-from-scratch-with-a-live-ui-that-shows-every-step.md", "text": "https://wpnews.pro/news/building-a-langgraph-rag-agent-from-scratch-with-a-live-ui-that-shows-every-step.txt", "jsonld": "https://wpnews.pro/news/building-a-langgraph-rag-agent-from-scratch-with-a-live-ui-that-shows-every-step.jsonld"}}