AIchain Agent: Plan, Act, Reflect A developer introduced AIchain Agent, a framework that enables AI models to plan, execute, and reflect on tasks dynamically rather than following a fixed chain of steps. The agent uses an orchestrator model to decide which tools to call and can adapt its plan mid-run based on results, supporting both waterfall and agile execution modes. The project is available on GitHub and aims to handle tasks where the number and order of steps are unknown in advance. A Chain knows every step before it runs. You define step one, step two, step three — and it executes them in order. That works when the problem is well-understood. But what happens when you don't know the steps in advance? When the output of one step determines whether you need two more steps or five? When a search returns nothing useful and the whole approach needs to change mid-run? That's where Agent comes in. It plans, observes what happened, and decides what to do next. The difference between a Chain and an Agent is the difference between a script and thinking. Consider a task like: "Find the official documentation for Qdrant, identify its main sections, and summarize each one." You don't know ahead of time how many searches you'll need, whether the first URL will be correct, or whether the page content will be structured enough to extract sections from. The number of steps depends on what actually happens at runtime. If you try to hard-code this as a Chain, you'll either over-engineer it with branching logic for every edge case, or you'll build something brittle that fails the moment reality doesn't match your assumptions. And it will. Reality always does. An Agent handles this naturally. It makes a plan, executes the first step, looks at the result, and adjusts. Maybe it needs one search. Maybe it needs three. The Agent figures that out as it goes. Here's the simplest possible Agent https://github.com/yaitio/aichain/blob/main/examples/13 agent.py — one model, one tool, one task: python import os from yait aichain.models import Model from yait aichain.agent import Agent from yait aichain.tools import searchPerplexity agent = Agent orchestrator=Model "claude-sonnet-4-6", api key=os.getenv "ANTHROPIC API KEY" , tools= searchPerplexity api key=os.getenv "PERPLEXITY API KEY" , max steps=5, mode="waterfall", result = agent.run "What are the main differences between Qdrant and Pinecone?" print result.output print f"steps={result.steps taken} tokens={result.tokens used:,}" The orchestrator model handles planning and reflection — it decides which tool to call, evaluates the result, and determines the next action. The tools list defines what the agent can do. And max steps caps how far it can go. The result object gives you three things worth caring about: .output the final answer , .steps taken how many steps actually ran , and .tokens used total tokens consumed . Enough to understand what happened and what it cost. Agents in yait aichain support two execution modes, and the choice between them shapes how the agent behaves. In waterfall mode, the agent builds a complete plan before executing anything. All steps laid out upfront, then run in order. The plan structure is fixed, but reflection still happens between steps — the agent can stop early if the task is already done, or retry a failed step. What it can't do is add new steps or rearrange the remaining ones. This gives you predictability. You can look at the plan and know roughly what the agent will do. It's the right choice when the task has a natural structure — "search, then summarize, then format" — even if you're not sure whether the search will need a retry. Agile mode is different. After every step, the agent looks at what just happened and can rewrite all remaining steps. Maybe the first search revealed that the question has two parts, so the agent adds a second search it didn't originally plan. Maybe a step returned exactly what was needed, so the agent skips three planned steps and jumps straight to the final answer. Here's an adaptive agent with multiple tools: python import os from yait aichain.models import Model from yait aichain.agent import Agent from yait aichain.tools import searchPerplexity, fetchPage, convertToMD agent = Agent orchestrator=Model "claude-sonnet-4-6", api key=os.getenv "ANTHROPIC API KEY" , tools= searchPerplexity api key=os.getenv "PERPLEXITY API KEY" , fetchPage , convertToMD , , max steps=8, mode="agile", result = agent.run "Find the official Qdrant documentation homepage URL, " "then fetch that page and tell me what the main sections are." print result.output print f"steps={result.steps taken} tokens={result.tokens used:,}" This agent has three tools. searchPerplexity finds the URL. fetchPage retrieves the raw page content. convertToMD strips the HTML down to Markdown so the model can read the structure cleanly. The agent decides on its own which tool to call at each step, and it can change its plan based on what each tool returns. That flexibility isn't free. The execution path is less predictable, and the agent may burn more tokens exploring approaches that don't pan out. Use waterfall when the task has a known shape. Use agile when it doesn't. An agent without max steps is an infinite loop waiting to happen. Without a hard cap, the agent keeps planning and executing until it exhausts the model's context window or burns through your token budget. In development, that's an awkward wait and a surprising bill. In production, it's an outage. Missing max steps. The agent runs until something breaks. agent = Agent orchestrator=Model "claude-sonnet-4-6", api key=os.getenv "ANTHROPIC API KEY" , tools= searchPerplexity api key=os.getenv "PERPLEXITY API KEY" , mode="agile", Always set max steps . For a single-tool task like "search and summarize," 5 steps covers one search, a possible retry, and the synthesis pass with room to spare. For multi-tool workflows — search, fetch, convert, analyze — 8 to 10 reflects the realistic step count without giving the agent room to spiral. You should also check whether the agent hit its ceiling before finishing: if result.steps taken == max steps: The agent ran out of steps before completing the task. Log, retry with a higher limit, or surface an error to the user. print f"Warning: agent hit max steps={max steps}. Output may be incomplete." else: print result.output Don't skip this check. An agent that hit max steps may return a plausible-looking but incomplete answer, and you won't know unless you look. If you already know the steps, use Chain. It's deterministic, cheaper, and easier to debug. Every time. Use Agent when the number or nature of steps can't be determined before execution begins — when the task requires reacting to intermediate results, when the path from question to answer isn't a straight line. My practical heuristic: if you can draw the workflow on a whiteboard before writing code, that's a Chain. If you'd need to draw multiple possible workflows with "it depends" arrows between them, that's an Agent. For reference, here's what you can configure: Agent orchestrator: Model, required — plans and reflects executors: list Model , optional — cheaper models for tool-call steps tools: list, optional — tools available to the agent mode: str, "waterfall" | "agile" default: "waterfall" max steps: int, hard cap on execution depth The executors parameter lets you assign a cheaper or faster model to run individual tool-call steps while keeping a more capable model as the orchestrator: python import os from yait aichain.models import Model from yait aichain.agent import Agent from yait aichain.tools import searchPerplexity, fetchPage, convertToMD agent = Agent orchestrator=Model "claude-sonnet-4-6", api key=os.getenv "ANTHROPIC API KEY" , executors= Model "gemini-2.5-flash", api key=os.getenv "GOOGLE API KEY" , tools= searchPerplexity api key=os.getenv "PERPLEXITY API KEY" , fetchPage , convertToMD , , max steps=8, mode="agile", The orchestrator handles the reasoning-heavy work — deciding what to do next, evaluating results, writing the final answer. The executor handles the mechanical steps in between: calling a tool and passing the output back. A capable model where it matters, a faster cheaper one where it doesn't. That split alone can meaningfully reduce costs on longer workflows.