Agent Series (2): ReAct — The Most Important Agent Reasoning Paradigm

The article explains the ReAct (Reasoning and Acting) agent paradigm, introduced by researchers from Princeton and Google in 2022, which improves upon Chain-of-Thought (CoT) reasoning by allowing a language model to alternate between reasoning and taking real-world actions. Unlike CoT, which reasons solely within its training data and can produce confident but incorrect answers, ReAct uses a "Thought → Action → Observation" loop that feeds real-world feedback back into the model, enabling dynamic, runtime-planned execution. The article includes a demonstration where a ReAct agent searches for the areas of Beijing and Shanghai, then calculates the difference, showcasing its ability to verify information and reason based on actual results.

You Think Your Agent Is "Thinking." It's Actually Just Predicting Tokens. Here's a scenario that happens more often than you'd think. You ask an Agent to write a competitive analysis report. It confidently outputs three professional-looking pages — complete with data, conclusions, and strategic recommendations. There's just one problem: every number comes from its training data, which may be a year old. It didn't search. It didn't verify. It just generated text that sounds authoritative. That's not thinking. That's fluent hallucination. Chain-of-Thought CoT has the same fundamental problem. CoT prompting tells the model to "reason step by step" before answering, and it genuinely does improve accuracy on many tasks. But the model is still reasoning entirely within language space. It can generate a very coherent chain of thought that leads to a completely wrong answer — because its only information source is training data. ReAct was built to solve this. ReAct: Reasoning + Acting, Interleaved In 2022, researchers from Princeton and Google published ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629 . The core idea is elegantly simple: let the model alternate between reasoning and acting, rather than reasoning first then acting, or acting without reasoning. The concrete form is a three-part loop: Thought → Action → Observation ↑ │ └─────────────────────────┘ - Thought : What the model is "thinking" — current analysis, what to do next, why - Action : The actual tool call and parameters - Observation : The real result returned by the tool The critical mechanism: Observation is fed back into the model as new context , allowing it to reason based on actual results. This creates the "think → act → observe → think again" loop. This one loop fixes CoT's core flaw: the model is no longer reasoning in isolation. It can interact with the real world and update its reasoning based on real feedback. A Concrete Example: Watching an Agent "Think" I built a complete ReAct Agent demo using LangGraph + GLM-4-Flash with two tools: calculator safe math evaluator and web search Bing search . Code: agent-01-react-agent/react agent.py https://github.com/chendongqi/llm-in-action/tree/main/agent-01-react-agent Here's an actual execution trace — Demo 3: search for the areas of Beijing and Shanghai, then calculate the difference. ════════════════════════════════════════════════════════════ Demo 3 ▸ Multi-Round Search Same Tool, Multiple Calls ════════════════════════════════════════════════════════════ User Question First search for Beijing's area, then Shanghai's area, then calculate how much larger Beijing is in km². ──────────────────────────────────────────────────────────── Step 1 THOUGHT → ACTION Action : web search query='北京面积 平方公里' Observation : • Beijing area: Total area 16,410.54 km²... ──────────────────────────────────────────────────────────── Step 2 THOUGHT → ACTION Action : web search query='上海面积 平方公里' Observation : • Shanghai area: Land area approximately 6,340.5 km²... ──────────────────────────────────────────────────────────── Step 3 THOUGHT → ACTION Action : calculator expression='16410.54 - 6340.5' Observation : 10070.04 ──────────────────────────────────────────────────────────── Final Answer Beijing's area is approximately 16,410.54 km², Shanghai's is approximately 6,340.5 km². Beijing is about 10,070.04 km² larger. ════════════════════════════════════════════════════════════ Notice what happened here: - The Agent decided on its own to search Beijing first, then Shanghai, then calculate — no hardcoded execution order - Each search result Observation was read by the model and used to determine the next step - The final calculation used real numbers extracted from real searches This is ReAct's value: the execution path is planned dynamically at runtime, not hardcoded by the developer in advance. ReAct vs. Chain-of-Thought: A Direct Comparison | Aspect | Chain-of-Thought | ReAct | |---|---|---| | Information source | Training data only | Training data + tool results | | Execution path | Reasoning in language space | Think → real action → observe results | | Can access real-time data | ✗ | ✓ via tools | | Can execute computation/code | ✗ | ✓ via tools | | Reasoning verifiable | Hard to verify | Each Observation is a real result | | Risk of side effects | Low no actions | High requires safety boundaries | One sentence summary: CoT makes the model think clearly. ReAct makes it think while doing. Building a ReAct Agent with LangGraph Here's the core implementation. The code uses LangGraph's create react agent — one of the cleanest ReAct implementations available. 1. Safe Calculator Tool python import ast import operator from typing import Any from langchain core.tools import tool SAFE OPS: dict type, Any = { ast.Add: operator.add, ast.Sub: operator.sub, ast.Mult: operator.mul, ast.Div: operator.truediv, ast.Pow: operator.pow, ast.Mod: operator.mod, ast.USub: operator.neg, } def eval ast node: ast.AST - float: if isinstance node, ast.Constant : return float node.value if isinstance node, ast.BinOp : op fn = SAFE OPS.get type node.op if op fn is None: raise ValueError f"Unsupported operator: {type node.op . name }" return op fn eval ast node.left , eval ast node.right if isinstance node, ast.UnaryOp : op fn = SAFE OPS.get type node.op return op fn eval ast node.operand raise ValueError f"Unsupported AST node: {type node . name }" @tool def calculator expression: str - str: """Evaluate a math expression. Supports + - / % and parentheses.""" try: tree = ast.parse expression.strip , mode="eval" result = eval ast tree.body if result == int result : return str int result return f"{result:.6g}" except ValueError, SyntaxError, ZeroDivisionError as e: return f"Calculation error: {e}" Why not just use eval ? eval " import 'os' .system 'rm -rf /' " — that line will execute a deletion on your machine. Tools are the Agent's "hands." Once an attacker manipulates the LLM through prompt injection, eval becomes a direct path to your system. AST parsing only allows math operation nodes — everything else is rejected. This is the foundational principle of safe tool design. 2. Web Search Tool python import requests from bs4 import BeautifulSoup from urllib.parse import quote BING HEADERS = { "User-Agent": "Mozilla/5.0 X11; Linux x86 64 AppleWebKit/537.36 " " KHTML, like Gecko Chrome/120.0.0.0 Safari/537.36" , "Accept-Language": "en-US,en;q=0.9", } @tool def web search query: str - str: """Search the web and return the 3 most relevant snippets.""" try: url = f"https://www.bing.com/search?q={quote query }&setlang=zh-CN" resp = requests.get url, headers= BING HEADERS, timeout=10 resp.raise for status soup = BeautifulSoup resp.text, "html.parser" snippets = for li in soup.find all "li", class ="b algo" :4 : h2 = li.find "h2" title = h2.get text strip=True if h2 else "" p = li.find "p" body = p.get text strip=True if p else "" if title or body: snippets.append f"• {title}: {body}" :200 return "\n".join snippets :3 if snippets else "No results found." except requests.RequestException as e: return f"Search failed: {e}" 3. Building the Agent python import os from dotenv import load dotenv from langchain openai import ChatOpenAI LangGraph V1.0 moved create react agent to chat agent executor submodule from langgraph.prebuilt.chat agent executor import create react agent load dotenv llm = ChatOpenAI base url="https://open.bigmodel.cn/api/paas/v4", api key=os.getenv "LLM API KEY" , model="glm-4-flash", temperature=0, agent = create react agent model=llm, tools= calculator, web search , result = agent.invoke {"messages": "user", "How much larger is Beijing than Shanghai in km²? Search and calculate." }, config={"recursion limit": 20}, print result "messages" -1 .content Three core lines: define tools → bind LLM → run. LangGraph handles all the message routing, tool call dispatch, result injection, and loop control under the hood. The correct import path for create react agent LangGraph V1.0 moved this function to langgraph.prebuilt.chat agent executor . Importing from langgraph.prebuilt triggers a LangGraphDeprecatedSinceV10 warning. Use the new path: python ✅ Recommended from langgraph.prebuilt.chat agent executor import create react agent ⚠️ Triggers deprecation warning from langgraph.prebuilt import create react agent How the Message Flow Actually Works To truly understand ReAct, you need to see the underlying message sequence. Here's what the LLM receives at the start of each cycle: Context passed to LLM at round N: ┌─────────────────────────────────────────────────────┐ │ System You are an assistant with these tools: │ │ calculator, web search │ │ │ │ Human Question: How much larger is Beijing? │ │ │ │ AI tool call web search "Beijing area" │ ← Round 1 Action │ Tool Beijing area: 16,410 km² │ ← Round 1 Observation │ │ │ AI tool call web search "Shanghai area" │ ← Round 2 Action │ Tool Shanghai area: 6,340 km² │ ← Round 2 Observation │ │ │ ← LLM decides what to do next here → │ └─────────────────────────────────────────────────────┘ Each cycle, the entire history is passed to the LLM. The model "sees" all previous thoughts and observations, then decides: - Continue calling tools more information needed - Stop and deliver a final answer enough information gathered This is why it's called a loop — the model itself is the loop's termination condition. It decides when to stop. When Things Go Wrong: Failure Modes and Guards The same "decide when to stop" design that makes ReAct powerful also introduces a risk: if the model misjudges, the loop never terminates. Common runaway scenarios: Scenario 1: Tool keeps failing, model keeps retrying Action: web search "vague ambiguous query" Observation: No results found Thought: Let me try different keywords Action: web search "different keywords" Observation: No results found Thought: Maybe one more variation... infinite loop Scenario 2: Model misunderstands the task and pursues the wrong direction Thought: I need the exact value of X Action: calculator "..." Observation: Approximate result Thought: Not precise enough, I need more decimal places Action: calculator "..." infinite pursuit of "precision" Scenario 3: Tools form a circular dependency Thought: I need to know A before I can look up B Action: search A Observation: Requires knowing B first Thought: I need to know B before I can look up A circular dependency LangGraph's recursion limit parameter is the hard safety net: result = agent.invoke {"messages": "user", question }, config={"recursion limit": 5}, Force-stop after 5 steps When the step count exceeds the limit, LangGraph raises GraphRecursionError : recursion limit triggered Exception type: GraphRecursionError Message: Recursion limit of 5 reached without hitting a stop condition... → Conclusion: Always set a reasonable recursion limit in production 15~25 recommended → Too low: legitimate tasks get cut off; Too high: runaway Agent burns massive tokens How to set recursion limit - Simple tasks single tool call : 5–8 steps is enough - Medium tasks multi-tool, multi-step : 10–15 steps - Complex research tasks: 20–25 steps - Tasks requiring 30+ steps should reconsider architecture — you may need multi-Agent collaboration covered in a later article The rule of thumb: set it to roughly 2× the number of steps a successful execution needs. Room to breathe, but a real ceiling. Five Demo Scenarios: From Simple to Complex The complete code includes 5 progressive demos covering the main ReAct usage patterns: Demo 1: Pure Calculation single tool, single step Question: Calculate 1024 768 + 1920 1080 Steps: calculator ' 1024 768 + 1920 1080 ' → 2860032 Validates the basic tool-calling pipeline. Demo 2: Search + Calculate multi-tool, multi-step Question: What year were Python and JavaScript first released? Calculate the difference. Steps: web search "Python release year" → web search "JavaScript release year" → calculator Shows the Agent autonomously orchestrating different tools in the right order. Demo 3: Multi-round Search same tool, multiple calls Question: How much larger is Beijing than Shanghai in km²? Steps: web search "Beijing area" → web search "Shanghai area" → calculator → 10070.04 Shows the Agent deciding what to search second based on what it found first. Demo 4: No Tools Needed direct answer Question: Explain the ReAct paradigm in one sentence. Steps: No tool calls — direct answer Shows the Agent knowing when not to call tools. This matters as much as knowing when to call them. Demo 5: Trigger recursion limit safety net demo Question: Search Python/Java/C release years, calculate the sum ~10 steps needed Limit: recursion limit=5 Result: GraphRecursionError correctly triggered Production safety mechanism verification. An Interesting Observation: Agents Can "Luck Into" Correct Answers Demo 2 produced a result worth documenting carefully. The Agent searched for JavaScript's release year. The Bing snippet it received came from an article published in 2023 that mentioned Python's 1991 origin. The model appears to have confused "2023" article publication date with JavaScript's release year. The calculation step ran 2023 - 1991 = 32 , returning 32. But the final answer was correct: "Python was released in 1991, JavaScript in 1995 — a 4-year difference." The model overrode its incorrect calculation result with its internal training knowledge and delivered the right answer. This reveals a subtle property of ReAct: an Agent's reasoning chain and its final answer can be decoupled. The model may make errors during tool calls, then "self-correct" in the final answer generation using built-in knowledge. As an outcome, this is fine — you got the right answer. From an engineering perspective, it's a problem. If you need traceable, verifiable conclusions, "it happened to be correct" isn't sufficient. This is one of the challenges that Harness Engineering addresses covered in a later article in this series . Trace Visualization: Making Agent Reasoning Observable A common production pain point: when something goes wrong, you don't know which step failed, because only the final answer is visible by default. Good practice: print the full Thought/Action/Observation sequence as a readable Trace: python from langchain core.messages import AIMessage, HumanMessage, ToolMessage def print trace result: dict - None: for msg in result "messages" : if isinstance msg, HumanMessage : print f" USER {msg.content}" elif isinstance msg, AIMessage : content = msg.content if isinstance msg.content, str else "" if msg.tool calls: for tc in msg.tool calls: args = ", ".join f"{k}={repr v }" for k, v in tc "args" .items print f" ACTION {tc 'name' } {args} " else: print f" FINAL ANSWER {content.strip }" elif isinstance msg, ToolMessage : obs = msg.content if isinstance msg.content, str else str msg.content print f" OBSERVATION {obs.strip :300 }" GLM-4-Flash content field pollution When using GLM-4-Flash, you may occasionally see raw JSON in AIMessage.content — something like {"index": 0, "delta": ...} . This is the model leaking internal streaming delta data into the content field. Fix: detect when content starts with { or and can be parsed by json.loads , then discard it. php def clean thought text: str - str: stripped = text.strip if stripped and stripped 0 in "{", " " : try: json.loads stripped return "" leaked JSON, discard except json.JSONDecodeError: pass return text The complete demo code already includes this handling. The Limitations of ReAct ReAct is powerful, but it's not a silver bullet. Knowing its limits helps you use it correctly. 1. Context window fills up fast Each cycle packs the entire history into context. Step count grows, token consumption spikes. Complex tasks 20+ steps may fail on models with limited context windows. 2. Tool descriptions drive everything — write them well ReAct relies entirely on the LLM understanding tool documentation to decide which tool to call and with what parameters. Vague docstrings lead to wrong tool selection. Tool descriptions are the invisible API of a ReAct system — treat them like API documentation. 3. No global planning capability Standard ReAct is greedy: each step only looks at the current state to decide the next move, with no "plan the whole thing first, then execute" capability. For tasks requiring long-horizon planning like writing an entire codebase , this can get stuck in local optima. This is what the Plan-and-Solve paradigm addresses Article 3 in this series . 4. Poor fault tolerance for tool failures When a tool returns an error, the model has to infer the next step from the error message alone. There's no predefined retry strategy or fallback logic. This needs to be handled at the tool design level and the Harness layer. Interview Prep: Articulate How Your Agent "Thinks" Common question: How does your Agent decide its next action? Many candidates answer "it calls tools." But what the interviewer actually wants to hear is: who decides which tool to call, and when does it stop? A clear answer framework: "We use the ReAct paradigm. The core is a Thought → Action → Observation loop. At each step, the LLM looks at the full context — user question plus all previous Observations — and decides the next Action. The tool runs, its result is injected as a ToolMessage, and the model reasons again. The loop terminates when the LLM judges it has enough information and stops calling tools, generating the final answer directly. To prevent runaway loops, we set recursion limit typically 15–25 . When it's exceeded, we catch the exception and fall back to a degraded response. We also log the full Trace — every Action and Observation — so we can replay the entire reasoning chain when debugging." Key differentiators: mentioning Trace observability and recursion limit shows you've thought beyond demos and considered production stability. Summary Three things from this article: ReAct = Reasoning + Acting, interleaved : The Thought → Action → Observation loop lets Agents update their reasoning based on real-world feedback. The fundamental difference from CoT: actions produce real results that feed back into the reasoning process. Tool design is ReAct's invisible interface : Docstring quality directly determines how accurately the LLM selects tools. Safe implementation AST instead of eval determines whether the system boundary holds.: The model decides when to stop — that's inherently risky. recursion limit is a required production setting recursion limit is the last line of defense. Recommended value: roughly 2× the steps needed for successful completion. Next up : Agent Series Article 3 — Plan-and-Solve: When ReAct Isn't Enough, How Agents Plan Before Acting . We'll see where ReAct's greedy strategy hits its ceiling on complex tasks, and how introducing an explicit planning layer breaks through it. References - Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629 , ICLR 2023 LangGraph Documentation https://langchain-ai.github.io/langgraph/ - hello-agents Open Tutorial https://github.com/datawhalechina/Hello-Agents Chapter 4 - Demo code for this article: agent-01-react-agent https://github.com/chendongqi/llm-in-action/tree/main/agent-01-react-agent Welcome to visit my personal homepage for more useful knowledge and interesting products