You Know Zero-Shot, One-Shot & CoT Prompting. But Do You Know ReAct?

wpnews.pro

Hello, I'm Maneshwar. I'm building git-lrc, a Micro AI code reviewer that runs on every commit. It is free and source-available on Github. Star git-lrc to help devs discover the project. Do give it a try and share your feedback.

So you've got the prompting starter pack down.

Zero-shot? Easy, you just ask.

One-shot and few-shot? You toss in an example or two and let the model take the hint.

Chain-of-Thought (CoT)? That's where you nudge the model to "think step by step" and watch it reason its way to an answer.

Solid toolkit. But here's the thing about CoT: the model is essentially talking to itself in a dark room.

It reasons beautifully, confidently... and sometimes confidently makes stuff up. No internet, no calculator, no peeking at Wikipedia. Just whatever it memorized during training.

That's where ReAct walks in and yes, it's a real act to follow. (I promise that's the worst pun in here xD)

ReAct comes from Yao et al., 2022, and the name is a smush of Rea soning + Act ing.

The idea is delightfully simple: instead of making the model only reason (CoT) or only act (like a plain tool-using agent), you interleave both.

The model produces reasoning traces ("here's what I'm thinking and why") and actions ("here's what I'm going to do about it") in an alternating loop.

The reasoning helps it plan, track progress, and recover from mistakes.

The acting lets it reach out to the real world i.e search engines, knowledge bases, environments and pull in fresh facts.

In short: reasoning decides what to look up next, and the retrieved info grounds the next round of reasoning.

They feed each other. It's less "lonely genius monologue" and more "detective who actually checks the evidence."

The result? On a bunch of language and decision-making tasks, ReAct beats several strong baselines, and as a bonus it's way more interpretable, you can literally read the model's train of thought and see why it did what it did.

The authors found the sweet spot is ReAct combined with CoT, so the model can lean on its own internal knowledge and external info when it needs to.

ReAct is inspired by something very human: we learn and make decisions by bouncing between thinking and doing.

You don't plan an entire road trip in your head and then drive it blindfolded you think, act, observe what happens, adjust, repeat.

The loop has three moving parts:

Search[...]

, Lookup[...]

, Finish[...]

).Here's the classic example from the paper, answering a question from HotpotQA:

Aside from the Apple Remote, what other devices can control the program Apple Remote was originally designed to interact with?

Watch the Thought → Act → Obs trajectory unfold.

The model thinks about what it needs, searches for it, reads the observation, and uses it to decide its next move.

To set this up, you pick a few cases from a training set (like HotpotQA), hand-write them in ReAct format, and use them as few-shot exemplars.

Each exemplar is a chain of thought-action-observation steps.

Those free-form thoughts do a lot of heavy lifting — decomposing the question, extracting info, doing a bit of commonsense or arithmetic reasoning, steering the next search, and finally stitching together the answer.

Here's a trimmed-down exemplar straight from the paper:

Question  What is the elevation range for the area that the eastern sector of the
          Colorado orogeny extends into?

Thought 1 I need to search Colorado orogeny, find the area that the eastern sector
          extends into, then find the elevation range of that area.
Action 1  Search[Colorado orogeny]
Obs 1     The Colorado orogeny was an episode of mountain building in Colorado
          and surrounding areas.

Thought 2 It does not mention the eastern sector. So I need to look up eastern sector.
Action 2  Lookup[eastern sector]
Obs 2     The eastern sector extends into the High Plains and is called the
          Central Plains orogeny.

Thought 3 The eastern sector extends into the High Plains. So I need to search
          High Plains and find its elevation range.
Action 3  Search[High Plains]
Obs 3     High Plains refers to one of two distinct land regions.

Thought 4 I need to instead search High Plains (United States).
Action 4  Search[High Plains (United States)]
Obs 4     The High Plains rise in elevation from around 1,800 to 7,000 ft.

Thought 5 So the answer is 1,800 to 7,000 ft.
Action 5  Finish[1,800 to 7,000 ft]

Notice Thought 4 the model realizes its first search was ambiguous and corrects itself.

That self-recovery is exactly the kind of thing pure CoT struggles with, because CoT has no observation to tell it that something went sideways.

One more thing: the prompt setup changes depending on the task.

For reasoning-heavy tasks (HotpotQA), you use lots of thought-action-observation steps.

For action-heavy decision-making tasks, you sprinkle thoughts in more sparingly.

The paper first tests ReAct on knowledge-heavy stuff, question answering (HotpotQA) and fact verification (Fever) — using PaLM-540B as the base model.

The headline: ReAct generally beats Act (acting only, no thinking) on both tasks, turns out a little reasoning goes a long way.

Against CoT it's more of a split decision:

The paper digs into why, and the short version is a neat little trade-off:

The best of both worlds? Methods that let the model switch between ReAct and CoT + Self-Consistency outperform everything else.

Grounded when it needs facts, free-flowing when it needs to reason.

Have your cake, retrieve it too.

ReAct isn't just a trivia champ, it also shows up for interactive, action-driven tasks.

The paper evaluates it on two benchmarks: ALFWorld (a text-based game) and WebShop (a simulated online shopping environment).

Both throw the model into messy environments where it has to reason in order to act and explore effectively.

Here, ReAct beats Act on both ALFWorld and WebShop.

Without any thoughts to lean on, Act flounders at breaking a goal into sub-goals, it knows how to do things but not why or in what order.

The reasoning steps give ReAct a clear edge.

That said, let's stay honest: these prompting-based agents are still a fair distance from expert human performance on these tasks. Promising, not finished.

CoT taught models to think out loud.

ReAct teaches them to think out loud and then go check.

By weaving reasoning and acting together, you get responses that are more factual, more trustworthy, and bonus for us, far easier to debug, because you can read the whole reasoning trail.

So the next time your LLM confidently invents a fact, maybe it doesn't need a better memory.

Maybe it just needs to get up, take a look around, and... ReAct.

Want the full details, error analyses, and extra results? The paper is right here: ReAct: Synergizing Reasoning and Acting in Language Models. All the images are from the above paper.

Disclaimer: This article was written by me; AI was used to fix grammar and improve readability.

Cover Image Credits: mario-fontana

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs — without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

GenAI today is a race car without brakes. It accelerates fast -- you describe something, and large blocks of code appear instantly. But AI agents silently break things: they remove logic, relax constraints, introduce expensive cloud calls, leak credentials, and change behavior -- without telling you. You often find out in production.

** git-lrc is your braking system.** It hooks into

git commit

and runs an AI review on every diff In short, git-lrc helps Prevent Outages, Breaches, and Technical Debt Before They Happen

At a glance: 10 risk categories · 100+ failure patterns tracked · every commit…

source & further reading

dev.to — original article I did the math and it turns out YOU are the reason AI isn't smarter yet From Delivery Rider to Building My First AI System — Here's My Story Title: How I Built ZenPlan: A Premium AI Habit Tracker with Next.js, Vercel OIDC, and Amazon DynamoDB

You Know Zero-Shot, One-Shot & CoT Prompting. But Do You Know ReAct?

Run your AI side-project on zahid.host