Building Agentic Workflows in Python

A developer outlines best practices for building agentic workflows in Python, defining an agent as a loop where the model decides which tool to call next until completion. The post provides a manual loop implementation with safety controls like iteration caps and validation, and advises using agents only for genuinely multi-step, open-ended tasks.

"Agent" has become the word for any program that calls an LLM more than once, which makes it a word worth being precise about. An agent, in the sense this post uses, is a loop: the model decides which tool to call next, your code executes it, and the result feeds back in — repeating until the model decides it's done. That's a genuinely different and riskier shape than a single request/response call. This post builds on Building Reliable LLM Applications in Python https://pg-blogs.netlify.app/posts/10-building-reliable-llm-apps-in-python/ : everything said there about retries, structured output, and evaluation still applies once you add a loop — it just applies to every iteration , and now the model is also choosing which side effects to trigger. We'll cover when an agent is actually warranted, the loop itself manual and SDK-assisted , and the safety controls that make handing a model the wheel defensible. Reach for an agent only when the task is genuinely multi-step and open-ended: the number and order of actions can't be known ahead of time, so a fixed pipeline can't express it. Most tasks that feel agentic are actually better served by something simpler and more debuggable. There's a ladder, and you should stop climbing it the moment the task is satisfied: Before building step 3, run the task past four checks. If any answer is "no," stay at step 1 or 2: An agent is a deliberate escalation, not a default. Most production LLM features never need one. Once an agent is warranted, the shape is the same regardless of the tools involved: call the model with a list of available tools; if it responds asking to use one stop reason == "tool use" , execute that tool in your own code and send the result back as a tool result ; repeat until the model responds with end turn . Two ways to run that loop in Python — write it by hand for full control, or let the SDK's tool runner drive it for you. Writing the loop yourself means every tool call passes through your code before it executes, which is where you validate arguments, log the decision, and gate anything irreversible: python import anthropic client = anthropic.Anthropic reads ANTHROPIC API KEY from env — never hardcode MAX ITERATIONS = 10 messages = {"role": "user", "content": user input} iterations = 0 while True: iterations += 1 if iterations MAX ITERATIONS: raise RuntimeError "Agent exceeded iteration cap — stopping" response = client.messages.create model="claude-opus-4-8", max tokens=16000, thinking={"type": "adaptive"}, tools=tools, messages=messages, if response.stop reason == "end turn": break tool use blocks = b for b in response.content if b.type == "tool use" Log the assistant turn including any tool use requests before acting on it messages.append {"role": "assistant", "content": response.content} tool results = for tool in tool use blocks: Validate BEFORE executing — tool.input is model-provided, untrusted data result = execute validated tool tool.name, tool.input tool results.append { "type": "tool result", "tool use id": tool.id, "content": result, } messages.append {"role": "user", "content": tool results} final text = next b.text for b in response.content if b.type == "text" Two things earn their keep here that a convenience runner would hide: the MAX ITERATIONS cap, and the log point right before the tool result round-trip. Both are cheap to add and expensive to retrofit after an agent has looped in production for an hour. When you don't need to intercept every call — a low-stakes, read-only agent, or a prototype — the beta tool runner drives the same loop for you. Decorate a plain function with @beta tool ; its docstring becomes the tool description the model sees: php from anthropic import beta tool @beta tool def get weather location: str - str: """Get current weather for a location. Args: location: City and state, e.g. San Francisco, CA. """ return f"Sunny, 72°F in {location}" runner = client.beta.messages.tool runner model="claude-opus-4-8", max tokens=16000, tools= get weather , messages= {"role": "user", "content": "Weather in Paris?"} , for message in runner: ... each iteration is a BetaMessage; loop ends when Claude is done The trade-off is explicit: the runner is fewer lines, but your validation and approval logic has to live inside the tool function rather than at a single choke point between the model and execution. For anything past a read-only demo, the manual loop's explicit checkpoint is worth the extra code. The loop's shape — how many iterations are allowed, what counts as done, how a failed tool call is retried — belongs in Python, not in a system prompt asking the model to "keep trying until it works." As covered in Building Reliable LLM Applications in Python https://pg-blogs.netlify.app/posts/10-building-reliable-llm-apps-in-python/ , use the model for judgment which tool, with what arguments, when to stop and code for bookkeeping the loop, the retry policy, the cap, the audit log . An agent that reasons its own way through retry logic in natural language is slower, more expensive, and less predictable than an except block that already knows what to do with a transient failure. Free-text hand-offs between agent steps are where errors compound silently — a slightly malformed field from step two becomes a wrong argument in step three's tool call. Where a step's output needs to be used by the next step not just displayed to a person , get it back as a validated, typed object instead of prose to re-parse: python from pydantic import BaseModel class PlanStep BaseModel : action: str done: bool response = client.messages.parse model="claude-opus-4-8", max tokens=16000, messages= {"role": "user", "content": "What is the next step, and are we done?"} , output format=PlanStep, step = response.parsed output a validated PlanStep, not a string to parse if step.done: ... stop the loop deterministically — no guessing from prose A validated PlanStep either parses or raises; there's no regex trying to guess whether the model meant "done" or "we're basically done." An agent is a program that decides, at runtime, which of your functions to call and with what arguments — based on text it read. Treat every tool as an attack surface accordingly: tool.input or a tool function's arguments is model-provided data and must be treated as untrusted, exactly like a request body from the network. Whitelist allowed values, bound numeric ranges, and reject anything that doesn't fit the tool's contract MAX ITERATIONS or a wall-clock timeout . Without one, a confused model can loop indefinitely, burning tokens and possibly retrying a failing tool call forever. response.usage per turn and alert on runaway loops the same way you'd alert on a runaway retry storm. ANTHROPIC API KEY via anthropic.Anthropic — no key ever appears in source, config committed to version control, or logs.