ReAct Inside — From Message to State, Understanding How AI Agents Really Work

wpnews.pro

When people first encounter ReAct (Reason + Act), they often think it's just adding three fields—

Thought / Action / Observation

—to the prompt.But in reality, the core of ReAct isn't the prompt format. It's the

Agent's State Machine.This article explains, from an engineering perspective, how ReAct actually works inside an LLM, and how it relates to modern Function Calling and Tool Calling.

ReAct (Reason + Act) comes from the 2022 paper ReAct: Synergizing Reasoning and Acting in Language Models, authored by Shunyu Yao et al., a collaboration between Princeton University and Google Research.

Its core idea is actually quite simple:

Let the LLM call external tools (Act) at any point during its reasoning (Reason), then continue reasoning based on what the tools return.

Here's an analogy. A traditional LLM is like a student taking a closed-book exam—once the question is given, it writes out the whole answer in one go, relying only on what it has memorized:

User
    │
    ▼
LLM
    │
    ▼
Answer

ReAct is more like a student taking an open-book exam who can also look things up online. Whenever it hits something uncertain, it first thinks "I need to check this," goes off to flip through a book, look up the weather, or run a calculation, and then continues writing once it has the result:

User
    │
    ▼
LLM
    │
Thought      ← what should I do
    │
Action       ← go check the weather
    │
Tool         ← the tool actually runs
    │
Observation  ← the result it gets back
    │
LLM
    │
Thought      ← keep reasoning based on the result
    │
Answer

Its biggest change is this:

The model no longer spits out the final answer all at once. Instead, it can "think → act → get feedback → think again."

Almost every introductory article draws a diagram like this:

Thought
   ↓
Action
   ↓
Observation

And so many people draw two conclusions:

Neither conclusion is accurate.

To explain it clearly, we first need to distinguish two completely different concepts:

In the next few sections, we'll pull the problem apart along these two concepts.

Suppose the user asks a very everyday question:

Is it good for running in Shanghai today?

Throughout the whole process, the Messages that are actually produced are these:

User Message                ← User: Is it good for running in Shanghai today?
        │
        ▼
Assistant Message #1        ← Model output
        │
        ├── Thought          I should check the weather first
        └── Action(weather)  call weather("Shanghai")
        │
        ▼
Tool Message                ← Tool returns
        │
        └── Observation      26℃, humidity 90%, rain
        │
        ▼
Assistant Message #2        ← Model output again
        │
        ├── Thought          rainy and humid, not great
        └── Final Answer     Not recommended, it's raining today

There are two key points here:

In other words, at the Message level, only three kinds of roles take part in the conversation: User, Assistant, and Tool.

Let's first address a point that's easy to confuse: in terms of content, Observation really is the return value of Action.

For example, the model emits an action:

Action: weather("Shanghai")

After the tool executes, it returns:

26℃
Humidity: 90%
Rain: true

This return is the Observation.

So if it's the same thing content-wise, why does the paper still pull Observation out separately?

The key isn't the content—it's the source:

Assistant
    │
    └── Action       comes from the model (what the model "wants" to do)

Tool
    │
    └── Observation  comes from the outside world (what actually happened)

Action comes from the model, Observation comes from the real environment, and the two must never be generated by the same role.

Why be so strict about this? Because if Observation were also written by the model itself, the model could pretend the tool already executed successfully and fabricate a result that never actually happened.

For example, suppose the model wrote this all in one go:

Action:
Search("Apple CEO")

Observation:
Tim Cook

If Observation were also generated by the model, it could make things up entirely—even if the search never ran, it could still "find" a name, or even invent a wrong answer.

That's why modern Agents always insert the tool's real return into the context as a separate Message. Only then is the model forced to face the real result, instead of talking to itself.

This is another spot that's easy to get tangled up in.

Since Thought and Action are in the same Assistant Message:

Assistant Message
    Thought
    Action

why does the paper still describe them separately?

The reason comes back to those two concepts:

They're talking about two different things. Thought and Action correspond to the two stages of decision-making:

Thought:  I want to know the weather   ← Decision (deciding what to do)
   ↓
Action:   weather("Shanghai")          ← the execution instruction the model emits

To distinguish them in one sentence:

What the paper really wants to convey is how the LLM makes decisions step by step, not what the API looks like. So conceptually, it separates decision (Thought) from execution (Action).

There's another layer here that many people miss: Action isn't a single action—it internally splits into two halves.

weather("Shanghai")

." It can't—and has no ability to—actually check the weather itself.And Observation is the result that comes back after the second half, the "execution," runs.

Stringing the whole chain together by role makes it clearer:

LLM     │  Thought         I need to check the weather
        │  Action(intent)  I "want" to call weather("Shanghai")   ← just proposing
        ▼
Agent   │  execute Action  actually call the weather API           ← doing the real work
        │  Observation     26℃, rain                               ← execution result
        ▼
LLM     │  Thought         it's raining, not suitable

So "Action → Observation" is strictly speaking not done by the model alone: the model is responsible for proposing, and the Agent is responsible for executing and fetching the result. This also echoes Section 4—Observation must be independent, because it comes from the Agent's real execution, not the model's imagination.

One more thing worth emphasizing: Action is a logical concept in the paper. It is not "welded" into some function-call field of an AI message.

In the paper, Action is essentially the abstract behavior of "the Agent decides on and performs one external operation." It can be realized in many ways:

Search[Apple CEO]

, which the Agent then parsed with a regex and executed;tool_calls

;These are all different engineering implementations of the same Action concept. Function calling is merely the most popular one right now, not the definition of Action itself. Equating "Action" with "function calling" is exactly what happens when you only see the Prompt/Message layer and miss the State layer behind it.

Once you understand the two sections above, you can see that real ReAct is essentially a state machine.

Thought
   │
   ▼
Action
   │
   ▼
Observation
   │
   ▼
Thought
   │
   ▼
Action
   │
   ▼
Observation
   │
   ▼
  ...

Written as code, it's roughly this loop:

while not finished:
    thought = llm(history)            # LLM: decide + propose action
    action = choose_tool(thought)     # pick the tool the model wants to call
    observation = run(action)         # Agent: actually execute, fetch result
    history.append(observation)       # append back to context, next iteration

The four elements each have their own job:

The whole loop repeats until the model decides it can wrap up and outputs the final answer.

If you've used the tool-calling features of OpenAI, Claude, or Gemini, you'll notice they actually no longer output text like this:

Thought:
...

Action:
...

Instead, they directly emit a structured tool call:

{
    "tool_calls": [
        {
            "function": "weather",
            "arguments": {
                "city": "Shanghai"
            }
        }
    ]
}

After the program executes the tool, it stuffs the result back as a tool message:

{
    "role": "tool",
    "content": "26℃, humidity 90%, rain"
}

Finally it calls the LLM once more to get the final answer:

User
   ↓
Assistant(tool_call)
   ↓
Tool(result)
   ↓
Assistant(final answer)

Throughout this whole process, Thought is nowhere to be seen.

But that doesn't mean Thought disappeared:

Thought hasn't disappeared. It has simply moved from "written explicitly in the prompt" to "the model's internal Hidden Reasoning."

Modern models usually don't expose this reasoning process directly to developers (reasoning models put it in a separate reasoning field). The decision step still exists—it's just been tucked away inside the model.

If we shift our viewpoint to inside the LLM, the whole flow can be drawn like this:

                +----------------+
                | User Message   |
                +--------+-------+
                         |
                         ▼
              +-------------------+
              | Internal Reasoning|
              | (Thought)         |
              +--------+----------+
                       |
                       ▼
              +-------------------+
              | Tool Selection    |
              | (Action)          |
              +--------+----------+
                       |
                       ▼
              +-------------------+
              | Tool Execution    |
              +--------+----------+
                       |
                       ▼
              +-------------------+
              | Observation       |
              | (Tool Message)    |
              +--------+----------+
                       |
                       ▼
              +-------------------+
              | Internal Reasoning|
              | (Thought)         |
              +--------+----------+
                       |
                       ▼
                 Final Answer

What's truly looping is these three actions:

Reason → Act → Observe → Reason → ...

and not, as many people assume:

Prompt → Prompt → Prompt → ...

In other words, the body of the loop is the flow of state, not a pile of stacked text formats.

To pull together what we've covered, we can look at ReAct from three levels.

The first level is Prompt. The Thought / Action / Observation

in the paper is just there to conveniently display the reasoning trace—a "display format" for humans to read.

The second level is Message. The messages a modern Agent actually exchanges come in only three kinds: User, Assistant, and Tool. This is the "communication protocol" that lands on the API.

The third level is State, and it's the true core. It describes the flow of the Agent's internal state:

Decision
   ↓
Execution
   ↓
Environment Feedback
   ↓
Decision

This state machine is the essence of ReAct.

ReAct in one sentence:

ReAct is not a prompt template—it's an Agent's state machine.

The key to understanding it is to separate three levels:

Thought / Action / Observation

—just a display format for expressing the reasoning process.User / Assistant / Tool

—the actual API communication protocol.Thought → Action → Observation

—the Agent's true internal state machine.Although modern Function Calling no longer explicitly outputs Thought, underneath it still follows the same state transitions:

Reason → Act → Observe → Reason → ...

So we can understand the relationship between the two like this:

Function Calling is the engineering implementation of ReAct; ReAct is the design philosophy behind Function Calling.

If you found this article helpful, feel free to like, bookmark, and follow. I'll keep sharing more valuable content. Your support is my greatest motivation to create!

source & further reading

dev.to — original article I Processed 500,000 Job Applications With AI. Here Is What the Data Actually Shows. JavaScript still can't ship a full-stack module Route Phone Calls to an AI Agent With the Telnyx Voice API

ReAct Inside — From Message to State, Understanding How AI Agents Really Work

Run your AI side-project on zahid.host