# AIchain Agent: Plan, Act, Reflect

> Source: <https://dev.to/yait/aichain-agent-plan-act-reflect-2n71>
> Published: 2026-06-20 10:02:00+00:00

A **Chain** knows every step before it runs. You define step one, step two, step three — and it executes them in order. That works when the problem is well-understood. But what happens when you *don't* know the steps in advance? When the output of one step determines whether you need two more steps or five? When a search returns nothing useful and the whole approach needs to change mid-run?

That's where **Agent** comes in. It plans, observes what happened, and decides what to do next. The difference between a Chain and an Agent is the difference between a script and thinking.

Consider a task like: "Find the official documentation for Qdrant, identify its main sections, and summarize each one." You don't know ahead of time how many searches you'll need, whether the first URL will be correct, or whether the page content will be structured enough to extract sections from. The number of steps depends on what actually happens at runtime.

If you try to hard-code this as a Chain, you'll either over-engineer it with branching logic for every edge case, or you'll build something brittle that fails the moment reality doesn't match your assumptions. And it will. Reality always does.

An Agent handles this naturally. It makes a plan, executes the first step, looks at the result, and adjusts. Maybe it needs one search. Maybe it needs three. The Agent figures that out as it goes.

Here's [the simplest possible Agent](https://github.com/yaitio/aichain/blob/main/examples/13_agent.py) — one model, one tool, one task:

``` python
import os
from yait_aichain.models import Model
from yait_aichain.agent import Agent
from yait_aichain.tools import searchPerplexity

agent = Agent(
    orchestrator=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    tools=[searchPerplexity(api_key=os.getenv("PERPLEXITY_API_KEY"))],
    max_steps=5,
    mode="waterfall",
)

result = agent.run("What are the main differences between Qdrant and Pinecone?")
print(result.output)
print(f"steps={result.steps_taken}  tokens={result.tokens_used:,}")
```

The `orchestrator`

model handles planning and reflection — it decides which tool to call, evaluates the result, and determines the next action. The `tools`

list defines what the agent *can* do. And `max_steps`

caps how far it can go.

The `result`

object gives you three things worth caring about: `.output`

(the final answer), `.steps_taken`

(how many steps actually ran), and `.tokens_used`

(total tokens consumed). Enough to understand what happened and what it cost.

Agents in yait_aichain support two execution modes, and the choice between them shapes how the agent behaves.

In waterfall mode, the agent builds a complete plan before executing anything. All steps laid out upfront, then run in order. The plan structure is fixed, but reflection still happens between steps — the agent can stop early if the task is already done, or retry a failed step. What it can't do is add new steps or rearrange the remaining ones.

This gives you predictability. You can look at the plan and know roughly what the agent will do. It's the right choice when the task has a natural structure — "search, then summarize, then format" — even if you're not sure whether the search will need a retry.

Agile mode is different. After every step, the agent looks at what just happened and can rewrite all remaining steps. Maybe the first search revealed that the question has two parts, so the agent adds a second search it didn't originally plan. Maybe a step returned exactly what was needed, so the agent skips three planned steps and jumps straight to the final answer.

Here's an adaptive agent with multiple tools:

``` python
import os
from yait_aichain.models import Model
from yait_aichain.agent import Agent
from yait_aichain.tools import searchPerplexity, fetchPage, convertToMD

agent = Agent(
    orchestrator=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    tools=[
        searchPerplexity(api_key=os.getenv("PERPLEXITY_API_KEY")),
        fetchPage(),
        convertToMD(),
    ],
    max_steps=8,
    mode="agile",
)

result = agent.run(
    "Find the official Qdrant documentation homepage URL, "
    "then fetch that page and tell me what the main sections are."
)

print(result.output)
print(f"steps={result.steps_taken}  tokens={result.tokens_used:,}")
```

This agent has three tools. `searchPerplexity`

finds the URL. `fetchPage`

retrieves the raw page content. `convertToMD`

strips the HTML down to Markdown so the model can read the structure cleanly. The agent decides on its own which tool to call at each step, and it can change its plan based on what each tool returns.

That flexibility isn't free. The execution path is less predictable, and the agent may burn more tokens exploring approaches that don't pan out.

**Use waterfall when the task has a known shape. Use agile when it doesn't.**

An agent without `max_steps`

is an infinite loop waiting to happen.

Without a hard cap, the agent keeps planning and executing until it exhausts the model's context window or burns through your token budget. In development, that's an awkward wait and a surprising bill. In production, it's an outage.

```
# Missing max_steps. The agent runs until something breaks.
agent = Agent(
    orchestrator=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    tools=[searchPerplexity(api_key=os.getenv("PERPLEXITY_API_KEY"))],
    mode="agile",
)
```

Always set `max_steps`

. For a single-tool task like "search and summarize," 5 steps covers one search, a possible retry, and the synthesis pass with room to spare. For multi-tool workflows — search, fetch, convert, analyze — 8 to 10 reflects the realistic step count without giving the agent room to spiral.

You should also check whether the agent hit its ceiling before finishing:

```
if result.steps_taken == max_steps:
    # The agent ran out of steps before completing the task.
    # Log, retry with a higher limit, or surface an error to the user.
    print(f"Warning: agent hit max_steps={max_steps}. Output may be incomplete.")
else:
    print(result.output)
```

Don't skip this check. An agent that hit `max_steps`

may return a plausible-looking but incomplete answer, and you won't know unless you look.

If you already know the steps, use Chain. It's deterministic, cheaper, and easier to debug. Every time.

Use Agent when the number or nature of steps can't be determined before execution begins — when the task requires reacting to intermediate results, when the path from question to answer isn't a straight line.

My practical heuristic: if you can draw the workflow on a whiteboard before writing code, that's a Chain. If you'd need to draw multiple possible workflows with "it depends" arrows between them, that's an Agent.

For reference, here's what you can configure:

```
Agent(
    orchestrator: Model,        # required — plans and reflects
    executors:    list[Model],  # optional — cheaper models for tool-call steps
    tools:        list,         # optional — tools available to the agent
    mode:         str,          # "waterfall" | "agile"  (default: "waterfall")
    max_steps:    int,          # hard cap on execution depth
)
```

The `executors`

parameter lets you assign a cheaper or faster model to run individual tool-call steps while keeping a more capable model as the orchestrator:

``` python
import os
from yait_aichain.models import Model
from yait_aichain.agent import Agent
from yait_aichain.tools import searchPerplexity, fetchPage, convertToMD

agent = Agent(
    orchestrator=Model("claude-sonnet-4-6", api_key=os.getenv("ANTHROPIC_API_KEY")),
    executors=[Model("gemini-2.5-flash", api_key=os.getenv("GOOGLE_API_KEY"))],
    tools=[
        searchPerplexity(api_key=os.getenv("PERPLEXITY_API_KEY")),
        fetchPage(),
        convertToMD(),
    ],
    max_steps=8,
    mode="agile",
)
```

The orchestrator handles the reasoning-heavy work — deciding what to do next, evaluating results, writing the final answer. The executor handles the mechanical steps in between: calling a tool and passing the output back. A capable model where it matters, a faster cheaper one where it doesn't. That split alone can meaningfully reduce costs on longer workflows.