{"slug": "building-agentic-workflows-in-python", "title": "Building Agentic Workflows in Python", "summary": "A developer outlines best practices for building agentic workflows in Python, defining an agent as a loop where the model decides which tool to call next until completion. The post provides a manual loop implementation with safety controls like iteration caps and validation, and advises using agents only for genuinely multi-step, open-ended tasks.", "body_md": "\"Agent\" has become the word for any program that calls an LLM more than once, which makes it a word worth being precise about. An agent, in the sense this post uses, is a loop: the model decides which tool to call next, your code executes it, and the result feeds back in — repeating until the model decides it's done. That's a genuinely different (and riskier) shape than a single request/response call.\n\nThis post builds on [Building Reliable LLM Applications in Python](https://pg-blogs.netlify.app/posts/10-building-reliable-llm-apps-in-python/): everything said there about retries, structured output, and evaluation still applies once you add a loop — it just applies to *every iteration*, and now the model is also choosing which side effects to trigger. We'll cover when an agent is actually warranted, the loop itself (manual and SDK-assisted), and the safety controls that make handing a model the wheel defensible.\n\nReach for an agent only when the task is genuinely multi-step and open-ended: the number and order of actions can't be known ahead of time, so a fixed pipeline can't express it. Most tasks that *feel* agentic are actually better served by something simpler and more debuggable. There's a ladder, and you should stop climbing it the moment the task is satisfied:\n\nBefore building step 3, run the task past four checks. If any answer is \"no,\" stay at step 1 or 2:\n\nAn agent is a deliberate escalation, not a default. Most production LLM features never need one.\n\nOnce an agent is warranted, the shape is the same regardless of the tools involved: call the model with a list of available tools; if it responds asking to use one (`stop_reason == \"tool_use\"`\n\n), execute that tool in your own code and send the result back as a `tool_result`\n\n; repeat until the model responds with `end_turn`\n\n. Two ways to run that loop in Python — write it by hand for full control, or let the SDK's tool runner drive it for you.\n\nWriting the loop yourself means every tool call passes through your code before it executes, which is where you validate arguments, log the decision, and gate anything irreversible:\n\n``` python\nimport anthropic\n\nclient = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env — never hardcode\n\nMAX_ITERATIONS = 10\n\nmessages = [{\"role\": \"user\", \"content\": user_input}]\niterations = 0\n\nwhile True:\n    iterations += 1\n    if iterations > MAX_ITERATIONS:\n        raise RuntimeError(\"Agent exceeded iteration cap — stopping\")\n\n    response = client.messages.create(\n        model=\"claude-opus-4-8\",\n        max_tokens=16000,\n        thinking={\"type\": \"adaptive\"},\n        tools=tools,\n        messages=messages,\n    )\n\n    if response.stop_reason == \"end_turn\":\n        break\n\n    tool_use_blocks = [b for b in response.content if b.type == \"tool_use\"]\n\n    # Log the assistant turn (including any tool_use requests) before acting on it\n    messages.append({\"role\": \"assistant\", \"content\": response.content})\n\n    tool_results = []\n    for tool in tool_use_blocks:\n        # Validate BEFORE executing — tool.input is model-provided, untrusted data\n        result = execute_validated_tool(tool.name, tool.input)\n        tool_results.append({\n            \"type\": \"tool_result\",\n            \"tool_use_id\": tool.id,\n            \"content\": result,\n        })\n    messages.append({\"role\": \"user\", \"content\": tool_results})\n\nfinal_text = next(b.text for b in response.content if b.type == \"text\")\n```\n\nTwo things earn their keep here that a convenience runner would hide: the `MAX_ITERATIONS`\n\ncap, and the log point right before the tool result round-trip. Both are cheap to add and expensive to retrofit after an agent has looped in production for an hour.\n\nWhen you don't need to intercept every call — a low-stakes, read-only agent, or a prototype — the beta tool runner drives the same loop for you. Decorate a plain function with `@beta_tool`\n\n; its docstring becomes the tool description the model sees:\n\n``` php\nfrom anthropic import beta_tool\n\n@beta_tool\ndef get_weather(location: str) -> str:\n    \"\"\"Get current weather for a location.\n\n    Args:\n        location: City and state, e.g. San Francisco, CA.\n    \"\"\"\n    return f\"Sunny, 72°F in {location}\"\n\nrunner = client.beta.messages.tool_runner(\n    model=\"claude-opus-4-8\",\n    max_tokens=16000,\n    tools=[get_weather],\n    messages=[{\"role\": \"user\", \"content\": \"Weather in Paris?\"}],\n)\nfor message in runner:\n    ...  # each iteration is a BetaMessage; loop ends when Claude is done\n```\n\nThe trade-off is explicit: the runner is fewer lines, but your validation and approval logic has to live *inside* the tool function rather than at a single choke point between the model and execution. For anything past a read-only demo, the manual loop's explicit checkpoint is worth the extra code.\n\nThe loop's *shape* — how many iterations are allowed, what counts as done, how a failed tool call is retried — belongs in Python, not in a system prompt asking the model to \"keep trying until it works.\" As covered in [Building Reliable LLM Applications in Python](https://pg-blogs.netlify.app/posts/10-building-reliable-llm-apps-in-python/), use the model for judgment (which tool, with what arguments, when to stop) and code for bookkeeping (the loop, the retry policy, the cap, the audit log). An agent that reasons its own way through retry logic in natural language is slower, more expensive, and less predictable than an `except`\n\nblock that already knows what to do with a transient failure.\n\nFree-text hand-offs between agent steps are where errors compound silently — a slightly malformed field from step two becomes a wrong argument in step three's tool call. Where a step's output needs to be *used* by the next step (not just displayed to a person), get it back as a validated, typed object instead of prose to re-parse:\n\n``` python\nfrom pydantic import BaseModel\n\nclass PlanStep(BaseModel):\n    action: str\n    done: bool\n\nresponse = client.messages.parse(\n    model=\"claude-opus-4-8\",\n    max_tokens=16000,\n    messages=[{\"role\": \"user\", \"content\": \"What is the next step, and are we done?\"}],\n    output_format=PlanStep,\n)\n\nstep = response.parsed_output   # a validated PlanStep, not a string to parse\nif step.done:\n    ...  # stop the loop deterministically — no guessing from prose\n```\n\nA validated `PlanStep`\n\neither parses or raises; there's no regex trying to guess whether the model meant \"done\" or \"we're basically done.\"\n\nAn agent is a program that decides, at runtime, which of your functions to call and with what arguments — based on text it read. Treat every tool as an attack surface accordingly:\n\n`tool.input`\n\n(or a tool function's arguments) is model-provided data and must be treated as untrusted, exactly like a request body from the network. Whitelist allowed values, bound numeric ranges, and reject anything that doesn't fit the tool's contract `MAX_ITERATIONS`\n\n(or a wall-clock timeout). Without one, a confused model can loop indefinitely, burning tokens and possibly retrying a failing tool call forever.`response.usage`\n\nper turn and alert on runaway loops the same way you'd alert on a runaway retry storm.`ANTHROPIC_API_KEY`\n\nvia `anthropic.Anthropic()`\n\n— no key ever appears in source, config committed to version control, or logs.", "url": "https://wpnews.pro/news/building-agentic-workflows-in-python", "canonical_source": "https://dev.to/gpuneet/building-agentic-workflows-in-python-40b8", "published_at": "2026-07-04 15:29:32+00:00", "updated_at": "2026-07-04 15:48:46.556636+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "developer-tools"], "entities": ["Anthropic", "Claude"], "alternates": {"html": "https://wpnews.pro/news/building-agentic-workflows-in-python", "markdown": "https://wpnews.pro/news/building-agentic-workflows-in-python.md", "text": "https://wpnews.pro/news/building-agentic-workflows-in-python.txt", "jsonld": "https://wpnews.pro/news/building-agentic-workflows-in-python.jsonld"}}