{"slug": "agent-series-2-react-the-most-important-agent-reasoning-paradigm", "title": "Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm", "summary": "The article explains the ReAct (Reasoning and Acting) agent paradigm, introduced by researchers from Princeton and Google in 2022, which improves upon Chain-of-Thought (CoT) reasoning by allowing a language model to alternate between reasoning and taking real-world actions. Unlike CoT, which reasons solely within its training data and can produce confident but incorrect answers, ReAct uses a \"Thought → Action → Observation\" loop that feeds real-world feedback back into the model, enabling dynamic, runtime-planned execution. The article includes a demonstration where a ReAct agent searches for the areas of Beijing and Shanghai, then calculates the difference, showcasing its ability to verify information and reason based on actual results.", "body_md": "## You Think Your Agent Is \"Thinking.\" It's Actually Just Predicting Tokens.\n\nHere's a scenario that happens more often than you'd think.\n\nYou ask an Agent to write a competitive analysis report. It confidently outputs three professional-looking pages — complete with data, conclusions, and strategic recommendations.\n\nThere's just one problem: every number comes from its training data, which may be a year old. It didn't search. It didn't verify. It just generated text that sounds authoritative.\n\n**That's not thinking. That's fluent hallucination.**\n\nChain-of-Thought (CoT) has the same fundamental problem. CoT prompting tells the model to \"reason step by step\" before answering, and it genuinely does improve accuracy on many tasks. But the model is still reasoning entirely within language space. It can generate a very coherent chain of thought that leads to a completely wrong answer — because its only information source is training data.\n\nReAct was built to solve this.\n\n## ReAct: Reasoning + Acting, Interleaved\n\nIn 2022, researchers from Princeton and Google published [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629).\n\nThe core idea is elegantly simple: **let the model alternate between reasoning and acting, rather than reasoning first then acting, or acting without reasoning.**\n\nThe concrete form is a three-part loop:\n\n```\nThought  →  Action  →  Observation\n   ↑                         │\n   └─────────────────────────┘\n```\n\n-\n**Thought**: What the model is \"thinking\" — current analysis, what to do next, why -\n**Action**: The actual tool call and parameters -\n**Observation**: The real result returned by the tool\n\nThe critical mechanism: **Observation is fed back into the model as new context**, allowing it to reason based on actual results. This creates the \"think → act → observe → think again\" loop.\n\nThis one loop fixes CoT's core flaw: **the model is no longer reasoning in isolation. It can interact with the real world and update its reasoning based on real feedback.**\n\n## A Concrete Example: Watching an Agent \"Think\"\n\nI built a complete ReAct Agent demo using LangGraph + GLM-4-Flash with two tools: `calculator`\n\n(safe math evaluator) and `web_search`\n\n(Bing search).\n\nCode: [agent-01-react-agent/react_agent.py](https://github.com/chendongqi/llm-in-action/tree/main/agent-01-react-agent)\n\nHere's an actual execution trace — Demo 3: search for the areas of Beijing and Shanghai, then calculate the difference.\n\n```\n════════════════════════════════════════════════════════════\n  Demo 3 ▸ Multi-Round Search (Same Tool, Multiple Calls)\n════════════════════════════════════════════════════════════\n\n[User Question]\n  First search for Beijing's area, then Shanghai's area,\n  then calculate how much larger Beijing is in km².\n────────────────────────────────────────────────────────────\n\n[Step 1] THOUGHT → ACTION\n  Action  : web_search(query='北京面积 平方公里')\n\n  Observation : • Beijing area: Total area 16,410.54 km²...\n────────────────────────────────────────────────────────────\n\n[Step 2] THOUGHT → ACTION\n  Action  : web_search(query='上海面积 平方公里')\n\n  Observation : • Shanghai area: Land area approximately 6,340.5 km²...\n────────────────────────────────────────────────────────────\n\n[Step 3] THOUGHT → ACTION\n  Action  : calculator(expression='16410.54 - 6340.5')\n\n  Observation : 10070.04\n────────────────────────────────────────────────────────────\n\n[Final Answer]\n  Beijing's area is approximately 16,410.54 km², Shanghai's is\n  approximately 6,340.5 km². Beijing is about 10,070.04 km² larger.\n════════════════════════════════════════════════════════════\n```\n\nNotice what happened here:\n\n- The Agent\n**decided on its own** to search Beijing first, then Shanghai, then calculate — no hardcoded execution order - Each search result (Observation) was read by the model and used to determine the next step\n- The final calculation used real numbers extracted from real searches\n\nThis is ReAct's value: **the execution path is planned dynamically at runtime, not hardcoded by the developer in advance.**\n\n## ReAct vs. Chain-of-Thought: A Direct Comparison\n\n| Aspect | Chain-of-Thought | ReAct |\n|---|---|---|\n| Information source | Training data only | Training data + tool results |\n| Execution path | Reasoning in language space | Think → real action → observe results |\n| Can access real-time data | ✗ | ✓ (via tools) |\n| Can execute computation/code | ✗ | ✓ (via tools) |\n| Reasoning verifiable | Hard to verify | Each Observation is a real result |\n| Risk of side effects | Low (no actions) | High (requires safety boundaries) |\n\nOne sentence summary: **CoT makes the model think clearly. ReAct makes it think while doing.**\n\n## Building a ReAct Agent with LangGraph\n\nHere's the core implementation. The code uses LangGraph's `create_react_agent`\n\n— one of the cleanest ReAct implementations available.\n\n### 1. Safe Calculator Tool\n\n``` python\nimport ast\nimport operator\nfrom typing import Any\nfrom langchain_core.tools import tool\n\n_SAFE_OPS: dict[type, Any] = {\n    ast.Add:  operator.add,\n    ast.Sub:  operator.sub,\n    ast.Mult: operator.mul,\n    ast.Div:  operator.truediv,\n    ast.Pow:  operator.pow,\n    ast.Mod:  operator.mod,\n    ast.USub: operator.neg,\n}\n\ndef _eval_ast(node: ast.AST) -> float:\n    if isinstance(node, ast.Constant):\n        return float(node.value)\n    if isinstance(node, ast.BinOp):\n        op_fn = _SAFE_OPS.get(type(node.op))\n        if op_fn is None:\n            raise ValueError(f\"Unsupported operator: {type(node.op).__name__}\")\n        return op_fn(_eval_ast(node.left), _eval_ast(node.right))\n    if isinstance(node, ast.UnaryOp):\n        op_fn = _SAFE_OPS.get(type(node.op))\n        return op_fn(_eval_ast(node.operand))\n    raise ValueError(f\"Unsupported AST node: {type(node).__name__}\")\n\n@tool\ndef calculator(expression: str) -> str:\n    \"\"\"Evaluate a math expression. Supports + - * / ** % and parentheses.\"\"\"\n    try:\n        tree = ast.parse(expression.strip(), mode=\"eval\")\n        result = _eval_ast(tree.body)\n        if result == int(result):\n            return str(int(result))\n        return f\"{result:.6g}\"\n    except (ValueError, SyntaxError, ZeroDivisionError) as e:\n        return f\"Calculation error: {e}\"\n```\n\n**Why not just use eval()?**\n\n`eval(\"__import__('os').system('rm -rf /')\")`\n\n— that line will execute a deletion on your machine. Tools are the Agent's \"hands.\" Once an attacker manipulates the LLM through prompt injection, `eval()`\n\nbecomes a direct path to your system.\n\nAST parsing only allows math operation nodes — everything else is rejected. This is the foundational principle of safe tool design.\n\n### 2. Web Search Tool\n\n``` python\nimport requests\nfrom bs4 import BeautifulSoup\nfrom urllib.parse import quote\n\n_BING_HEADERS = {\n    \"User-Agent\": (\n        \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 \"\n        \"(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\"\n    ),\n    \"Accept-Language\": \"en-US,en;q=0.9\",\n}\n\n@tool\ndef web_search(query: str) -> str:\n    \"\"\"Search the web and return the 3 most relevant snippets.\"\"\"\n    try:\n        url = f\"https://www.bing.com/search?q={quote(query)}&setlang=zh-CN\"\n        resp = requests.get(url, headers=_BING_HEADERS, timeout=10)\n        resp.raise_for_status()\n\n        soup = BeautifulSoup(resp.text, \"html.parser\")\n        snippets = []\n        for li in soup.find_all(\"li\", class_=\"b_algo\")[:4]:\n            h2 = li.find(\"h2\")\n            title = h2.get_text(strip=True) if h2 else \"\"\n            p = li.find(\"p\")\n            body = p.get_text(strip=True) if p else \"\"\n            if title or body:\n                snippets.append(f\"• {title}: {body}\"[:200])\n\n        return \"\\n\".join(snippets[:3]) if snippets else \"No results found.\"\n    except requests.RequestException as e:\n        return f\"Search failed: {e}\"\n```\n\n### 3. Building the Agent\n\n``` python\nimport os\nfrom dotenv import load_dotenv\nfrom langchain_openai import ChatOpenAI\n# LangGraph V1.0 moved create_react_agent to chat_agent_executor submodule\nfrom langgraph.prebuilt.chat_agent_executor import create_react_agent\n\nload_dotenv()\n\nllm = ChatOpenAI(\n    base_url=\"https://open.bigmodel.cn/api/paas/v4\",\n    api_key=os.getenv(\"LLM_API_KEY\"),\n    model=\"glm-4-flash\",\n    temperature=0,\n)\n\nagent = create_react_agent(\n    model=llm,\n    tools=[calculator, web_search],\n)\n\nresult = agent.invoke(\n    {\"messages\": [(\"user\", \"How much larger is Beijing than Shanghai in km²? Search and calculate.\")]},\n    config={\"recursion_limit\": 20},\n)\nprint(result[\"messages\"][-1].content)\n```\n\nThree core lines: define tools → bind LLM → run. LangGraph handles all the message routing, tool call dispatch, result injection, and loop control under the hood.\n\n**The correct import path for create_react_agent**\n\nLangGraph V1.0 moved this function to `langgraph.prebuilt.chat_agent_executor`\n\n. Importing from `langgraph.prebuilt`\n\ntriggers a `LangGraphDeprecatedSinceV10`\n\nwarning. Use the new path:\n\n``` python\n# ✅ Recommended\nfrom langgraph.prebuilt.chat_agent_executor import create_react_agent\n\n# ⚠️ Triggers deprecation warning\nfrom langgraph.prebuilt import create_react_agent\n```\n\n## How the Message Flow Actually Works\n\nTo truly understand ReAct, you need to see the underlying message sequence. Here's what the LLM receives at the start of each cycle:\n\n```\nContext passed to LLM at round N:\n┌─────────────────────────────────────────────────────┐\n│ [System]  You are an assistant with these tools:    │\n│           calculator, web_search                    │\n│                                                     │\n│ [Human]   Question: How much larger is Beijing?     │\n│                                                     │\n│ [AI]      (tool call) web_search(\"Beijing area\")   │  ← Round 1 Action\n│ [Tool]    Beijing area: 16,410 km²                 │  ← Round 1 Observation\n│                                                     │\n│ [AI]      (tool call) web_search(\"Shanghai area\")  │  ← Round 2 Action\n│ [Tool]    Shanghai area: 6,340 km²                 │  ← Round 2 Observation\n│                                                     │\n│ ← LLM decides what to do next here →               │\n└─────────────────────────────────────────────────────┘\n```\n\nEach cycle, the entire history is passed to the LLM. The model \"sees\" all previous thoughts and observations, then decides:\n\n- Continue calling tools (more information needed)\n- Stop and deliver a final answer (enough information gathered)\n\nThis is why it's called a loop — **the model itself is the loop's termination condition.** It decides when to stop.\n\n## When Things Go Wrong: Failure Modes and Guards\n\nThe same \"decide when to stop\" design that makes ReAct powerful also introduces a risk: **if the model misjudges, the loop never terminates.**\n\nCommon runaway scenarios:\n\n**Scenario 1: Tool keeps failing, model keeps retrying**\n\n```\nAction: web_search(\"vague ambiguous query\")\nObservation: No results found\nThought: Let me try different keywords\nAction: web_search(\"different keywords\")\nObservation: No results found\nThought: Maybe one more variation...\n(infinite loop)\n```\n\n**Scenario 2: Model misunderstands the task and pursues the wrong direction**\n\n```\nThought: I need the exact value of X\nAction: calculator(\"...\")\nObservation: Approximate result\nThought: Not precise enough, I need more decimal places\nAction: calculator(\"...\")\n(infinite pursuit of \"precision\")\n```\n\n**Scenario 3: Tools form a circular dependency**\n\n```\nThought: I need to know A before I can look up B\nAction: search(A)\nObservation: Requires knowing B first\nThought: I need to know B before I can look up A\n(circular dependency)\n```\n\nLangGraph's `recursion_limit`\n\nparameter is the hard safety net:\n\n```\nresult = agent.invoke(\n    {\"messages\": [(\"user\", question)]},\n    config={\"recursion_limit\": 5},  # Force-stop after 5 steps\n)\n```\n\nWhen the step count exceeds the limit, LangGraph raises `GraphRecursionError`\n\n:\n\n```\n[recursion_limit triggered]\n  Exception type: GraphRecursionError\n  Message: Recursion limit of 5 reached without hitting a stop condition...\n\n→ Conclusion: Always set a reasonable recursion_limit in production (15~25 recommended)\n→ Too low: legitimate tasks get cut off; Too high: runaway Agent burns massive tokens\n```\n\n**How to set recursion_limit**\n\n- Simple tasks (single tool call): 5–8 steps is enough\n- Medium tasks (multi-tool, multi-step): 10–15 steps\n- Complex research tasks: 20–25 steps\n- Tasks requiring 30+ steps should reconsider architecture — you may need multi-Agent collaboration (covered in a later article)\n\nThe rule of thumb: **set it to roughly 2× the number of steps a successful execution needs.** Room to breathe, but a real ceiling.\n\n## Five Demo Scenarios: From Simple to Complex\n\nThe complete code includes 5 progressive demos covering the main ReAct usage patterns:\n\n**Demo 1: Pure Calculation (single tool, single step)**\n\n```\nQuestion: Calculate (1024 * 768) + (1920 * 1080)\nSteps: calculator('(1024 * 768) + (1920 * 1080)') → 2860032\n```\n\nValidates the basic tool-calling pipeline.\n\n**Demo 2: Search + Calculate (multi-tool, multi-step)**\n\n```\nQuestion: What year were Python and JavaScript first released? Calculate the difference.\nSteps: web_search(\"Python release year\") → web_search(\"JavaScript release year\") → calculator\n```\n\nShows the Agent autonomously orchestrating different tools in the right order.\n\n**Demo 3: Multi-round Search (same tool, multiple calls)**\n\n```\nQuestion: How much larger is Beijing than Shanghai in km²?\nSteps: web_search(\"Beijing area\") → web_search(\"Shanghai area\") → calculator → 10070.04\n```\n\nShows the Agent deciding what to search second based on what it found first.\n\n**Demo 4: No Tools Needed (direct answer)**\n\n```\nQuestion: Explain the ReAct paradigm in one sentence.\nSteps: No tool calls — direct answer\n```\n\nShows the Agent knowing when **not** to call tools. This matters as much as knowing when to call them.\n\n**Demo 5: Trigger recursion_limit (safety net demo)**\n\n```\nQuestion: Search Python/Java/C release years, calculate the sum (~10 steps needed)\nLimit: recursion_limit=5\nResult: GraphRecursionError (correctly triggered)\n```\n\nProduction safety mechanism verification.\n\n## An Interesting Observation: Agents Can \"Luck Into\" Correct Answers\n\nDemo 2 produced a result worth documenting carefully.\n\nThe Agent searched for JavaScript's release year. The Bing snippet it received came from an article published in 2023 that mentioned Python's 1991 origin. The model appears to have confused \"2023\" (article publication date) with JavaScript's release year. The calculation step ran `2023 - 1991 = 32`\n\n, returning 32.\n\nBut the final answer was correct: \"Python was released in 1991, JavaScript in 1995 — a 4-year difference.\"\n\nThe model overrode its (incorrect) calculation result with its internal training knowledge and delivered the right answer.\n\nThis reveals a subtle property of ReAct: **an Agent's reasoning chain and its final answer can be decoupled.** The model may make errors during tool calls, then \"self-correct\" in the final answer generation using built-in knowledge.\n\nAs an outcome, this is fine — you got the right answer. From an engineering perspective, it's a problem. **If you need traceable, verifiable conclusions, \"it happened to be correct\" isn't sufficient.** This is one of the challenges that Harness Engineering addresses (covered in a later article in this series).\n\n## Trace Visualization: Making Agent Reasoning Observable\n\nA common production pain point: when something goes wrong, you don't know which step failed, because only the final answer is visible by default.\n\nGood practice: print the full Thought/Action/Observation sequence as a readable Trace:\n\n``` python\nfrom langchain_core.messages import AIMessage, HumanMessage, ToolMessage\n\ndef print_trace(result: dict) -> None:\n    for msg in result[\"messages\"]:\n        if isinstance(msg, HumanMessage):\n            print(f\"[USER] {msg.content}\")\n\n        elif isinstance(msg, AIMessage):\n            content = msg.content if isinstance(msg.content, str) else \"\"\n            if msg.tool_calls:\n                for tc in msg.tool_calls:\n                    args = \", \".join(f\"{k}={repr(v)}\" for k, v in tc[\"args\"].items())\n                    print(f\"[ACTION] {tc['name']}({args})\")\n            else:\n                print(f\"[FINAL ANSWER] {content.strip()}\")\n\n        elif isinstance(msg, ToolMessage):\n            obs = msg.content if isinstance(msg.content, str) else str(msg.content)\n            print(f\"[OBSERVATION] {obs.strip()[:300]}\")\n```\n\n**GLM-4-Flash content field pollution**\n\nWhen using GLM-4-Flash, you may occasionally see raw JSON in `AIMessage.content`\n\n— something like `{\"index\": 0, \"delta\": ...}`\n\n. This is the model leaking internal streaming delta data into the content field.\n\nFix: detect when content starts with `{`\n\nor `[`\n\nand can be parsed by `json.loads()`\n\n, then discard it.\n\n``` php\ndef _clean_thought(text: str) -> str:\n    stripped = text.strip()\n    if stripped and stripped[0] in (\"{\", \"[\"):\n        try:\n            json.loads(stripped)\n            return \"\"  # leaked JSON, discard\n        except json.JSONDecodeError:\n            pass\n    return text\n```\n\nThe complete demo code already includes this handling.\n\n## The Limitations of ReAct\n\nReAct is powerful, but it's not a silver bullet. Knowing its limits helps you use it correctly.\n\n**1. Context window fills up fast**\n\nEach cycle packs the entire history into context. Step count grows, token consumption spikes. Complex tasks (20+ steps) may fail on models with limited context windows.\n\n**2. Tool descriptions drive everything — write them well**\n\nReAct relies entirely on the LLM understanding tool documentation to decide which tool to call and with what parameters. Vague docstrings lead to wrong tool selection. Tool descriptions are the invisible API of a ReAct system — treat them like API documentation.\n\n**3. No global planning capability**\n\nStandard ReAct is greedy: each step only looks at the current state to decide the next move, with no \"plan the whole thing first, then execute\" capability. For tasks requiring long-horizon planning (like writing an entire codebase), this can get stuck in local optima. This is what the Plan-and-Solve paradigm addresses (Article 3 in this series).\n\n**4. Poor fault tolerance for tool failures**\n\nWhen a tool returns an error, the model has to infer the next step from the error message alone. There's no predefined retry strategy or fallback logic. This needs to be handled at the tool design level and the Harness layer.\n\n## Interview Prep: Articulate How Your Agent \"Thinks\"\n\n**Common question: How does your Agent decide its next action?**\n\nMany candidates answer \"it calls tools.\" But what the interviewer actually wants to hear is: **who decides which tool to call, and when does it stop?**\n\nA clear answer framework:\n\n\"We use the ReAct paradigm. The core is a Thought → Action → Observation loop. At each step, the LLM looks at the full context — user question plus all previous Observations — and decides the next Action. The tool runs, its result is injected as a ToolMessage, and the model reasons again.\n\nThe loop terminates when the LLM judges it has enough information and stops calling tools, generating the final answer directly.\n\nTo prevent runaway loops, we set\n\n`recursion_limit`\n\n(typically 15–25). When it's exceeded, we catch the exception and fall back to a degraded response. We also log the full Trace — every Action and Observation — so we can replay the entire reasoning chain when debugging.\"\n\nKey differentiators: **mentioning Trace observability and recursion_limit shows you've thought beyond demos and considered production stability.**\n\n## Summary\n\nThree things from this article:\n\n**ReAct = Reasoning + Acting, interleaved**: The Thought → Action → Observation loop lets Agents update their reasoning based on real-world feedback. The fundamental difference from CoT: actions produce real results that feed back into the reasoning process.**Tool design is ReAct's invisible interface**: Docstring quality directly determines how accurately the LLM selects tools. Safe implementation (AST instead of eval) determines whether the system boundary holds.: The model decides when to stop — that's inherently risky.`recursion_limit`\n\nis a required production setting`recursion_limit`\n\nis the last line of defense. Recommended value: roughly 2× the steps needed for successful completion.\n\n**Next up**: Agent Series Article 3 — **Plan-and-Solve: When ReAct Isn't Enough, How Agents Plan Before Acting**. We'll see where ReAct's greedy strategy hits its ceiling on complex tasks, and how introducing an explicit planning layer breaks through it.\n\n## References\n\n- Yao et al.,\n[ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629), ICLR 2023 [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)-\n[hello-agents Open Tutorial](https://github.com/datawhalechina/Hello-Agents)(Chapter 4) - Demo code for this article:\n[agent-01-react-agent](https://github.com/chendongqi/llm-in-action/tree/main/agent-01-react-agent)\n\n*Welcome to visit my personal homepage for more useful knowledge and interesting products*", "url": "https://wpnews.pro/news/agent-series-2-react-the-most-important-agent-reasoning-paradigm", "canonical_source": "https://dev.to/wonderlab/agent-series-2-react-the-most-important-agent-reasoning-paradigm-2b7k", "published_at": "2026-05-23 00:40:05+00:00", "updated_at": "2026-05-23 01:01:44.342377+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "research"], "entities": ["Princeton", "Google", "ReAct"], "alternates": {"html": "https://wpnews.pro/news/agent-series-2-react-the-most-important-agent-reasoning-paradigm", "markdown": "https://wpnews.pro/news/agent-series-2-react-the-most-important-agent-reasoning-paradigm.md", "text": "https://wpnews.pro/news/agent-series-2-react-the-most-important-agent-reasoning-paradigm.txt", "jsonld": "https://wpnews.pro/news/agent-series-2-react-the-most-important-agent-reasoning-paradigm.jsonld"}}