Agent Series (3): Plan-and-Solve — Think First, Then Act

The article describes the Plan-and-Solve agent architecture, which improves upon the ReAct model by first generating a complete action plan before executing any steps. Unlike ReAct's locally optimal, open-ended loop, Plan-and-Solve uses two phases—planning and execution—with built-in fault tolerance through replanning when steps fail. The architecture is implemented using LangGraph's state machine, where a defined state tracks the task, plan, completed steps, and replan count.

Where Does ReAct Hit a Wall? The previous article established ReAct's greedy strategy — each step looks at only the current state and decides the next action. This works well most of the time, but there's one class of task where it stumbles. Imagine you ask an Agent to do this: Search for the release years of Python, Java, and Go. Sort them chronologically. Calculate how many years apart Python and Go are. A typical ReAct execution might look like: Action: web search "Python release year" Action: web search "Java release year" Action: web search "Go release year" Action: calculator "..." occasionally repeats a search or takes extra steps That's not terrible — but there's a latent problem: ReAct has no global plan before acting. It doesn't know how many steps the task needs, doesn't know which step depends on which, and doesn't know where it is in the overall task. Every step is locally optimal, not globally optimal. For multi-step tasks with clear dependencies, this is like navigating without a map — you'll eventually arrive, but you'll take detours. Plan-and-Solve's answer : use the LLM to produce a complete action plan first, then execute step by step. The Two-Phase Architecture This paradigm comes from the 2023 paper Plan-and-Solve Prompting https://arxiv.org/abs/2305.04091 . The core idea is two phases: Phase 1 — Plan : Ask the LLM to analyze the entire task from a bird's-eye view and output an ordered list of steps. No tools are called during this phase — it's pure thinking. Phase 2 — Solve : Execute each step in the plan, one at a time. Tools can be called at each step. The result of the previous step is injected into the next step's context. With the production-essential fault-tolerance mechanisms added, the complete architecture looks like this: Task │ ▼ Plan Node ← LLM generates 3-7 step plan no execution, just planning │ ▼ Execute Node ← Execute current step embedded ReAct, can call tools │ ├─ Step failed? ─→ Replan Node ← Re-plan remaining steps based on progress so far │ │ │ └──────────────┐ │ ▼ ├─ More steps? ─→ back to Execute Execute continue │ └─ All done? ─→ Finalize Node ← Output final answer │ ▼ END The key difference from ReAct: ReAct is an open-ended loop; Plan-and-Solve is a sequence with a defined endpoint. LangGraph Implementation: State + Graph LangGraph is the ideal tool for this architecture — it models the Agent as a state machine StateGraph , with state flowing between nodes. State Design python from typing import TypedDict class PlanSolveState TypedDict : task: str original user task plan: list str current plan list of steps completed steps: list str completed steps with result summaries current step index: int which step we're on 0-based step result: str result of the current step replan count: int how many times we've replanned final answer: str the final answer State is the "bloodstream" of the entire graph — all nodes read from it and write to it. Design the state well, and you've won half the battle. Plan Node php def plan node state: PlanSolveState - dict: messages = SystemMessage content=PLANNER SYSTEM , planner expert prompt HumanMessage content=f"Task: {state 'task' }" , response = llm.invoke messages plan = parse plan response.content parse "1. xxx\n2. xxx" format return { "plan": plan, "current step index": 0, "completed steps": , } The Planner system prompt is critical: PLANNER SYSTEM = """You are a task planning expert. Rules: 1. Break the task into 3-7 independent steps 2. Each step must be concrete and actionable 3. Steps must have clear dependencies later steps can use earlier results 4. The final step should be "synthesize all information and deliver the answer" Output format only the step list, nothing else : 1. step description 2. step description ... """ Execute Node Embedded ReAct Sub-Agent php def execute node state: PlanSolveState - dict: idx = state "current step index" current step = state "plan" idx Build execution context includes results from completed steps system prompt = EXECUTOR SYSTEM.format completed steps=format completed steps state "completed steps" , current step=current step, Use a ReAct sub-agent to execute a single step may need tools sub agent = create react agent model=llm, tools= calculator, web search result = sub agent.invoke {"messages": SystemMessage content=system prompt , HumanMessage content=f"Execute this step: {current step}" , }, config={"recursion limit": 8}, step result = result "messages" -1 .content new completed = state "completed steps" + f"{current step} → {step result :100 }" return { "step result": step result, "completed steps": new completed, "current step index": idx + 1, } There's an important design choice here: the Execute node embeds a ReAct sub-agent. Plan-and-Solve and ReAct aren't mutually exclusive — Plan-and-Solve provides global structure, ReAct handles tool calls within each step. Routing Function php MAX REPLAN = 2 def should continue state - Literal "execute", "replan", "finalize" : idx = state "current step index" total = len state "plan" if idx = total: return "finalize" all steps complete detect step failure result = state.get "step result", "" failed = any kw in result for kw in "Calculation error", "Search failed", "Error" if failed and state "replan count" < MAX REPLAN: return "replan" failed, still have retry budget return "execute" keep going Building the Graph python from langgraph.graph import END, START, StateGraph graph = StateGraph PlanSolveState graph.add node "plan", plan node graph.add node "execute", execute node graph.add node "replan", replan node graph.add node "finalize", finalize node graph.add edge START, "plan" graph.add edge "plan", "execute" graph.add conditional edges "execute", should continue, {"execute": "execute", "replan": "replan", "finalize": "finalize"}, graph.add conditional edges "replan", after replan, {"execute": "execute", "finalize": "finalize"}, graph.add edge "finalize", END agent = graph.compile Full code: agent-02-plan-and-solve/plan and solve agent.py https://github.com/chendongqi/llm-in-action/tree/main/agent-02-plan-and-solve Real Execution: Watching the Plans Get Made Demo 1: Multi-Country Population Data Task: Search China, US, and India's populations. Calculate the total and China's share. The Planner's output : 1. Search "China population", "US population", "India population" to get the latest figures. 2. Record China, US, and India's population numbers. 3. Add China, US, and India's populations to get the three-country total. 4. Calculate China's population as a percentage of the three-country total. 5. Synthesize all information and deliver the final answer. Execution trace : Step 1 web search "China population" → 1.40489 billion web search "US population" → 341 million web search "India population" → 1.451 billion Step 2 Record results no tool call, model consolidates → China 1.40489B, US 341M, India: no data available ← ⚠️ Step 3 calculator "14048900000.0 + 3400000000.0" → 17448900000 ← ⚠️ India missing Step 4 calculator "14.0489 / 17.4489 100" → 80.5145% Final answer Three-country total: 1.74489B, China's share: 80.5145% Wait. What happened? Step 1 successfully found India's population 1.451 billion . But Step 2 said "no data available for India." Step 3's calculation only added China and the US. This is one of Plan-and-Solve's most common traps: information gets lost in transit between steps. Step 1's results were stored in completed steps , but the summary was truncated only 100 characters . Critical numbers may not have survived the truncation. Step 2 had no tool calls — it relied entirely on the model "remembering" Step 1's results from context. The model hallucinated "no data available." This isn't a bug; it's an inherent cost of the design decision: when the information chain is long, summary-style transmission causes information loss. Solutions in the last section. Demo 2: Dependency Chain Task iPhone Price in CNY Task: Search the latest iPhone's USD price, search the exchange rate, convert to CNY. The Planner generated a 7-step plan — when 3 steps would suffice search price, search rate, calculate . This demonstrates the Planner's tendency to over-plan simple tasks , splitting every small action into its own step. Step 6 produced an interesting tool failure: Step 6 Need to round 8836.45 → calculator "round 8836.45 " → Error: unsupported AST node: Call → calculator "round 8836.45, 0 " → Error: unsupported AST node: Call → Result: Sorry, need more steps to process this request. Our calculator only supports arithmetic — no function calls by design, to prevent injection . The model tried round twice, both failed, and gave up with an uncertain response. But in Step 7 the final synthesis , the model elegantly worked around it: 1299 USD × 6.8025 = 8836.45 CNY Rounded to approximately 8836 CNY It did the "rounding" in natural language, without a tool. Tool failure is not the end — the model's own capabilities can serve as a fallback. Demo 3: Simple Task Planning Task: Calculate 2^10 + 3^5. The Planner generated a 4-step plan : 1. Calculate 2 to the power of 10 2. Calculate 3 to the power of 5 3. Add the results of steps 1 and 2 4. Synthesize all information and give the final answer Compare to ReAct's approach: a single calculator "2 10 + 3 5" call. Done. Plan-and-Solve is clearly "overkill" here — turning a one-step calculation into 4 steps. This is one of the core trade-offs we need to discuss. Five Key Findings After running this demo, here are 5 observations that matter in real engineering: Finding 1: Planners tend to over-plan For simple tasks, LLMs turn every micro-action into its own step. This increases execution rounds and token consumption — making things slower. A good Planner prompt should explicitly limit: no more than 3 steps for simple tasks, only split when there's a genuine dependency. Finding 2: Information transmission between steps requires careful design Each step's result is stored as a natural language summary in completed steps . If the summary is too short, critical numbers get cut off India's population in Demo 1 . Fix: use structured formats JSON or key-value pairs to store step results, rather than truncated prose. Finding 3: Tool failure ≠ step failure The model can fall back to its own knowledge when tools fail Demo 2's rounding . Don't immediately trigger Replan on tool failure — let the Execute node handle it first. Only trigger Replan if the model truly cannot produce a reasonable result. Finding 4: Replan is a double-edged sword Replan gives the system fault tolerance, but also introduces uncertainty: the new plan may conflict with the original or skip necessary steps. Production recommendation: cap Replan at 2 attempts. If that's not enough, degrade gracefully — tell the user the task couldn't be completed. Finding 5: Plan-and-Solve and ReAct aren't opposites In our implementation, each Execute step internally uses a ReAct sub-agent. Plan-and-Solve provides "strategic planning," ReAct provides "tactical execution." This layered design is very common in real Agent engineering and is essentially what LangGraph was built for. When to Choose ReAct vs. Plan-and-Solve This is the core engineering judgment: Task analysis │ ├─ Fewer than 3 steps? │ └─ Use ReAct lightweight, fast │ ├─ Strong dependencies between steps? │ later steps need precise results from earlier steps │ └─ Plan-and-Solve explicit plan enforces dependency order │ ├─ Clear task boundary, enumerable steps? │ └─ Plan-and-Solve or even Workflow-Driven │ ├─ Open-ended task, fuzzy boundaries? │ └─ ReAct adapts to unknowns │ └─ Long-horizon planning 10+ steps ? └─ Consider multi-Agent architecture later article Real-world examples : | Scenario | Recommended | Reason | |---|---|---| | Search a fact and answer | ReAct | Single step, no planning needed | | Multi-source comparative analysis | Plan-and-Solve | Data collection has dependency order | | Auto-write code and test | Plan-and-Solve | Clear steps: write → run → fix | | Open-ended competitive research | ReAct | Search direction evolves dynamically | | Data processing pipeline | Workflow-Driven | Steps fully fixed | | Complex fault diagnosis | ReAct + Plan | Hybrid: plan investigation path, then execute dynamically | Fixing the Information Loss Problem The India population loss in Demo 1 has a few engineering solutions: Option A: Store step results in structured format Instead of natural language summaries: completed steps.append f"Search China population → {step result :100 }" Use structured data: step data = { "step index": idx, "description": current step, "result": step result, full result, no truncation "extracted values": {}, have the model extract key numbers } Option B: Dedicated state slot for collected data class PlanSolveState TypedDict : ... other fields ... collected data: dict str, Any dedicated storage for gathered data Each Execute step not only writes to completed steps but also extracts key data into collected data . Later steps read directly from this dictionary — no relying on the model to "remember" prose. Option C: Have the Planner specify data flow explicitly Prompt the Planner to annotate each step with: - "Input: which data from step X" - "Output: what data to produce and store where" This defines the data flow graph at the planning layer, before any execution begins. The three options increase in complexity and robustness. In production, match the complexity to the task. Interview Prep: Explaining the Plan/Execute Separation Interview question: Does your Agent plan before executing? How does that work? Many candidates describe ReAct — implicit reasoning during execution, no explicit plan. If you've implemented Plan-and-Solve, this is a strong differentiator: "We use different architectures for different task types. For tasks with few steps and fuzzy boundaries, ReAct's implicit reasoning is sufficient. For multi-step tasks with clear dependencies — like multi-source comparative analysis — we use Plan-and-Solve. Concretely: Plan phase uses the LLM to do a complete task decomposition and generate a step list — no tool calls at this stage, pure thinking. Solve phase executes each step sequentially, with an embedded ReAct sub-agent handling tool calls within each step. This gives us two advantages: the execution path is determined upfront, dependencies are explicit, and debugging is much easier. The Replan mechanism provides fault tolerance. A real pitfall we hit in production: information transmission between steps needs to be structured. Natural language summaries lose critical data — we moved to structured JSON for step results, so later steps don't rely on the model 'remembering' earlier results." This answer shows you've moved beyond running examples — you've encountered and thought through production problems. Summary Three things from this article: Plan-and-Solve = plan first, execute second : Compared to ReAct's greedy strategy, Plan-and-Solve generates a complete step list before execution, making dependencies visible and execution paths predictable. Best for structured multi-step tasks. Information transmission is the biggest pitfall : Passing data between steps via natural language summaries causes information loss. Production systems should use structured formats to store critical intermediate results — don't rely on the model to "remember" previous results. Plan-and-Solve and ReAct compose naturally : Plan-and-Solve provides global structure; ReAct handles tool calls within each step. This layered design is common in complex Agent systems. Next up : Agent Series Article 4 — Deep Dive into Tool Calling: Tools Are the Agent's "Hands," But Hand Design Determines What the Agent Can Do . We'll go deep on tool design principles, parameter validation, error handling, and how to prevent tools from becoming security vulnerabilities. References - Wang et al., Plan-and-Solve Prompting https://arxiv.org/abs/2305.04091 , ACL 2023 LangGraph Documentation: StateGraph https://langchain-ai.github.io/langgraph/ - hello-agents Open Tutorial https://github.com/datawhalechina/Hello-Agents Chapter 6 - Demo code: agent-02-plan-and-solve https://github.com/chendongqi/llm-in-action/tree/main/agent-02-plan-and-solve Welcome to visit my personal homepage for more useful knowledge and interesting products