{"slug": "react-inside-from-message-to-state-understanding-how-ai-agents-really-work", "title": "ReAct Inside — From Message to State, Understanding How AI Agents Really Work", "summary": "A developer explains the inner workings of the ReAct (Reason + Act) pattern for AI agents, distinguishing between the message-level and state-level perspectives. The core of ReAct is not the prompt format but the agent's state machine, where the model can interleave reasoning and tool calls. The developer clarifies that Action comes from the model while Observation comes from the external tool, and they must be generated by different roles to prevent fabrication.", "body_md": "When people first encounter ReAct (Reason + Act), they often think it's just adding three fields—\n\n`Thought / Action / Observation`\n\n—to the prompt.But in reality, the core of ReAct isn't the prompt format. It's the\n\nAgent's State Machine.This article explains, from an engineering perspective, how ReAct actually works inside an LLM, and how it relates to modern Function Calling and Tool Calling.\n\nReAct (Reason + Act) comes from the 2022 paper *ReAct: Synergizing Reasoning and Acting in Language Models*, authored by Shunyu Yao et al., a collaboration between Princeton University and Google Research.\n\nIts core idea is actually quite simple:\n\nLet the LLM call external tools (Act) at any point during its reasoning (Reason), then continue reasoning based on what the tools return.\n\nHere's an analogy. A traditional LLM is like a student taking a closed-book exam—once the question is given, it writes out the whole answer in one go, relying only on what it has memorized:\n\n```\nUser\n    │\n    ▼\nLLM\n    │\n    ▼\nAnswer\n```\n\nReAct is more like a student taking an open-book exam who can also look things up online. Whenever it hits something uncertain, it first thinks \"I need to check this,\" goes off to flip through a book, look up the weather, or run a calculation, and then continues writing once it has the result:\n\n```\nUser\n    │\n    ▼\nLLM\n    │\nThought      ← what should I do\n    │\nAction       ← go check the weather\n    │\nTool         ← the tool actually runs\n    │\nObservation  ← the result it gets back\n    │\nLLM\n    │\nThought      ← keep reasoning based on the result\n    │\nAnswer\n```\n\nIts biggest change is this:\n\nThe model no longer spits out the final answer all at once. Instead, it can \"think → act → get feedback → think again.\"\n\nAlmost every introductory article draws a diagram like this:\n\n```\nThought\n   ↓\nAction\n   ↓\nObservation\n```\n\nAnd so many people draw two conclusions:\n\nNeither conclusion is accurate.\n\nTo explain it clearly, we first need to distinguish two completely different concepts:\n\nIn the next few sections, we'll pull the problem apart along these two concepts.\n\nSuppose the user asks a very everyday question:\n\nIs it good for running in Shanghai today?\n\nThroughout the whole process, the Messages that are actually produced are these:\n\n```\nUser Message                ← User: Is it good for running in Shanghai today?\n        │\n        ▼\nAssistant Message #1        ← Model output\n        │\n        ├── Thought          I should check the weather first\n        └── Action(weather)  call weather(\"Shanghai\")\n        │\n        ▼\nTool Message                ← Tool returns\n        │\n        └── Observation      26℃, humidity 90%, rain\n        │\n        ▼\nAssistant Message #2        ← Model output again\n        │\n        ├── Thought          rainy and humid, not great\n        └── Final Answer     Not recommended, it's raining today\n```\n\nThere are two key points here:\n\nIn other words, at the Message level, only three kinds of roles take part in the conversation: User, Assistant, and Tool.\n\nLet's first address a point that's easy to confuse: in terms of **content**, Observation really is the return value of Action.\n\nFor example, the model emits an action:\n\n```\nAction: weather(\"Shanghai\")\n```\n\nAfter the tool executes, it returns:\n\n```\n26℃\nHumidity: 90%\nRain: true\n```\n\nThis return is the Observation.\n\nSo if it's the same thing content-wise, why does the paper still pull Observation out separately?\n\nThe key isn't the content—it's the **source**:\n\n```\nAssistant\n    │\n    └── Action       comes from the model (what the model \"wants\" to do)\n\nTool\n    │\n    └── Observation  comes from the outside world (what actually happened)\n```\n\nAction comes from the model, Observation comes from the real environment, and the two must never be generated by the same role.\n\nWhy be so strict about this? Because if Observation were also written by the model itself, the model could **pretend the tool already executed successfully** and fabricate a result that never actually happened.\n\nFor example, suppose the model wrote this all in one go:\n\n```\nAction:\nSearch(\"Apple CEO\")\n\nObservation:\nTim Cook\n```\n\nIf Observation were also generated by the model, it could make things up entirely—even if the search never ran, it could still \"find\" a name, or even invent a wrong answer.\n\nThat's why modern Agents always insert the tool's real return into the context as a **separate Message**. Only then is the model forced to face the real result, instead of talking to itself.\n\nThis is another spot that's easy to get tangled up in.\n\nSince Thought and Action are in the same Assistant Message:\n\n```\nAssistant Message\n    Thought\n    Action\n```\n\nwhy does the paper still describe them separately?\n\nThe reason comes back to those two concepts:\n\nThey're talking about two different things. Thought and Action correspond to the two stages of decision-making:\n\n```\nThought:  I want to know the weather   ← Decision (deciding what to do)\n   ↓\nAction:   weather(\"Shanghai\")          ← the execution instruction the model emits\n```\n\nTo distinguish them in one sentence:\n\nWhat the paper really wants to convey is **how the LLM makes decisions step by step**, not what the API looks like. So conceptually, it separates decision (Thought) from execution (Action).\n\nThere's another layer here that many people miss: **Action isn't a single action—it internally splits into two halves.**\n\n`weather(\"Shanghai\")`\n\n.\" It can't—and has no ability to—actually check the weather itself.And **Observation is the result that comes back after the second half, the \"execution,\" runs**.\n\nStringing the whole chain together by role makes it clearer:\n\n```\nLLM     │  Thought         I need to check the weather\n        │  Action(intent)  I \"want\" to call weather(\"Shanghai\")   ← just proposing\n        ▼\nAgent   │  execute Action  actually call the weather API           ← doing the real work\n        │  Observation     26℃, rain                               ← execution result\n        ▼\nLLM     │  Thought         it's raining, not suitable\n```\n\nSo \"Action → Observation\" is strictly speaking not done by the model alone: the model is responsible for **proposing**, and the Agent is responsible for **executing and fetching the result**. This also echoes Section 4—Observation must be independent, because it comes from the Agent's real execution, not the model's imagination.\n\nOne more thing worth emphasizing: **Action is a logical concept in the paper. It is not \"welded\" into some function-call field of an AI message.**\n\nIn the paper, Action is essentially the abstract behavior of \"the Agent decides on and performs one external operation.\" It can be realized in many ways:\n\n`Search[Apple CEO]`\n\n, which the Agent then parsed with a regex and executed;`tool_calls`\n\n;These are all different **engineering implementations** of the same Action concept. Function calling is merely the most popular one right now, not the definition of Action itself. Equating \"Action\" with \"function calling\" is exactly what happens when you only see the Prompt/Message layer and miss the State layer behind it.\n\nOnce you understand the two sections above, you can see that real ReAct is essentially a **state machine**.\n\n```\nThought\n   │\n   ▼\nAction\n   │\n   ▼\nObservation\n   │\n   ▼\nThought\n   │\n   ▼\nAction\n   │\n   ▼\nObservation\n   │\n   ▼\n  ...\n```\n\nWritten as code, it's roughly this loop:\n\n```\nwhile not finished:\n    thought = llm(history)            # LLM: decide + propose action\n    action = choose_tool(thought)     # pick the tool the model wants to call\n    observation = run(action)         # Agent: actually execute, fetch result\n    history.append(observation)       # append back to context, next iteration\n```\n\nThe four elements each have their own job:\n\nThe whole loop repeats until the model decides it can wrap up and outputs the final answer.\n\nIf you've used the tool-calling features of OpenAI, Claude, or Gemini, you'll notice they actually **no longer output** text like this:\n\n```\nThought:\n...\n\nAction:\n...\n```\n\nInstead, they directly emit a structured tool call:\n\n```\n{\n    \"tool_calls\": [\n        {\n            \"function\": \"weather\",\n            \"arguments\": {\n                \"city\": \"Shanghai\"\n            }\n        }\n    ]\n}\n```\n\nAfter the program executes the tool, it stuffs the result back as a tool message:\n\n```\n{\n    \"role\": \"tool\",\n    \"content\": \"26℃, humidity 90%, rain\"\n}\n```\n\nFinally it calls the LLM once more to get the final answer:\n\n```\nUser\n   ↓\nAssistant(tool_call)\n   ↓\nTool(result)\n   ↓\nAssistant(final answer)\n```\n\nThroughout this whole process, Thought is nowhere to be seen.\n\nBut that doesn't mean Thought disappeared:\n\nThought hasn't disappeared. It has simply moved from \"written explicitly in the prompt\" to \"the model's internal Hidden Reasoning.\"\n\nModern models usually don't expose this reasoning process directly to developers (reasoning models put it in a separate reasoning field). The decision step still exists—it's just been tucked away inside the model.\n\nIf we shift our viewpoint to inside the LLM, the whole flow can be drawn like this:\n\n```\n                +----------------+\n                | User Message   |\n                +--------+-------+\n                         |\n                         ▼\n              +-------------------+\n              | Internal Reasoning|\n              | (Thought)         |\n              +--------+----------+\n                       |\n                       ▼\n              +-------------------+\n              | Tool Selection    |\n              | (Action)          |\n              +--------+----------+\n                       |\n                       ▼\n              +-------------------+\n              | Tool Execution    |\n              +--------+----------+\n                       |\n                       ▼\n              +-------------------+\n              | Observation       |\n              | (Tool Message)    |\n              +--------+----------+\n                       |\n                       ▼\n              +-------------------+\n              | Internal Reasoning|\n              | (Thought)         |\n              +--------+----------+\n                       |\n                       ▼\n                 Final Answer\n```\n\nWhat's truly looping is these three actions:\n\n```\nReason → Act → Observe → Reason → ...\n```\n\nand not, as many people assume:\n\n```\nPrompt → Prompt → Prompt → ...\n```\n\nIn other words, the body of the loop is the **flow of state**, not a pile of stacked text formats.\n\nTo pull together what we've covered, we can look at ReAct from three levels.\n\nThe first level is **Prompt**. The `Thought / Action / Observation`\n\nin the paper is just there to conveniently display the reasoning trace—a \"display format\" for humans to read.\n\nThe second level is **Message**. The messages a modern Agent actually exchanges come in only three kinds: User, Assistant, and Tool. This is the \"communication protocol\" that lands on the API.\n\nThe third level is **State**, and it's the true core. It describes the flow of the Agent's internal state:\n\n```\nDecision\n   ↓\nExecution\n   ↓\nEnvironment Feedback\n   ↓\nDecision\n```\n\nThis state machine is the essence of ReAct.\n\nReAct in one sentence:\n\nReAct is not a prompt template—it's an Agent's state machine.\n\nThe key to understanding it is to separate three levels:\n\n`Thought / Action / Observation`\n\n—just a display format for expressing the reasoning process.`User / Assistant / Tool`\n\n—the actual API communication protocol.`Thought → Action → Observation`\n\n—the Agent's true internal state machine.Although modern Function Calling no longer explicitly outputs Thought, underneath it still follows the same state transitions:\n\n```\nReason → Act → Observe → Reason → ...\n```\n\nSo we can understand the relationship between the two like this:\n\nFunction Calling is the engineering implementation of ReAct; ReAct is the design philosophy behind Function Calling.\n\nIf you found this article helpful, feel free to **like, bookmark, and follow**. I'll keep sharing more valuable content. Your support is my greatest motivation to create!", "url": "https://wpnews.pro/news/react-inside-from-message-to-state-understanding-how-ai-agents-really-work", "canonical_source": "https://dev.to/eyanpen/react-inside-from-message-to-state-understanding-how-ai-agents-really-work-3epf", "published_at": "2026-06-29 13:03:03+00:00", "updated_at": "2026-06-29 13:19:03.716891+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-research"], "entities": ["ReAct", "Shunyu Yao", "Princeton University", "Google Research"], "alternates": {"html": "https://wpnews.pro/news/react-inside-from-message-to-state-understanding-how-ai-agents-really-work", "markdown": "https://wpnews.pro/news/react-inside-from-message-to-state-understanding-how-ai-agents-really-work.md", "text": "https://wpnews.pro/news/react-inside-from-message-to-state-understanding-how-ai-agents-really-work.txt", "jsonld": "https://wpnews.pro/news/react-inside-from-message-to-state-understanding-how-ai-agents-really-work.jsonld"}}