{"slug": "101-ai-agents-when-llms-start-taking-actions", "title": "101. AI Agents: When LLMs Start Taking Actions", "summary": "A developer has built a complete AI agent framework using Anthropic's Claude API that implements the ReAct (Reason + Act) pattern, enabling LLMs to autonomously pursue multi-step goals rather than simply responding to individual queries. The agent system, written in Python, maintains memory across steps, dynamically decides which tools to call based on observations, and can loop or backtrack until a specified objective is achieved or determined impossible. This implementation demonstrates the frontier of AI engineering where agents, while brittle and unpredictable, represent a shift from reactive systems to proactive systems that can handle complex goals like researching papers, comparing data, and saving results.", "body_md": "Everything you have built so far is reactive.\n\nUser sends a message. System processes it. System sends a response. Done.\n\nAn agent is different. An agent receives a goal, not a message. It decides what steps to take to achieve that goal. It uses tools. It observes the results. It adjusts its plan. It continues until the goal is achieved or it determines the goal cannot be achieved.\n\n\"Summarize this document\" is a task. One call. One response.\n\n\"Research recent papers on transformer efficiency, write a comparison table, and save it as a CSV\" is a goal. An agent needs to search the web multiple times, decide which papers are relevant, extract data from multiple sources, format it consistently, handle failures, and write to disk. Five to twenty tool calls. Dynamic decisions at each step.\n\nThis is the frontier of AI engineering. Agents are brittle. They fail in surprising ways. They are also what makes AI systems feel genuinely useful rather than just responsive.\n\n```\nprint(\"Agent vs Non-Agent:\")\nprint()\nprint(\"NON-AGENT (chain/pipeline):\")\nprint(\"  - Fixed sequence of steps\")\nprint(\"  - Steps determined at design time\")\nprint(\"  - No ability to react to intermediate results\")\nprint(\"  - Predictable, debuggable, less capable\")\nprint()\nprint(\"AGENT:\")\nprint(\"  - LLM decides what to do at each step\")\nprint(\"  - Steps determined at runtime based on observations\")\nprint(\"  - Can loop, backtrack, try alternative approaches\")\nprint(\"  - Powerful, unpredictable, capable of novel solutions\")\nprint()\n\nagent_properties = {\n    \"Perception\":   \"Receives inputs: user goal, tool results, memory\",\n    \"Reasoning\":    \"LLM decides what to do next given current state\",\n    \"Action\":       \"Executes tools, writes files, calls APIs, searches\",\n    \"Memory\":       \"Maintains context across multiple steps\",\n    \"Goal\":         \"Works toward a specified objective, not just responding\",\n}\n\nprint(\"The five properties of an agent:\")\nfor prop, description in agent_properties.items():\n    print(f\"  {prop:<15}: {description}\")\n\nprint()\nprint(\"The ReAct pattern (Reason + Act):\")\nprint(\"  Thought: 'I need to find the population of Tokyo'\")\nprint(\"  Action:  search_web('Tokyo population 2024')\")\nprint(\"  Observation: '13.96 million in city proper, 37.4M metro'\")\nprint(\"  Thought: 'I have the answer, now I can respond'\")\nprint(\"  Answer:  'Tokyo's population is approximately 13.96 million...'\")\npython\nimport json\nimport os\nfrom typing import List, Dict, Callable, Any, Optional\nfrom dataclasses import dataclass, field\nimport anthropic\n\n@dataclass\nclass Tool:\n    name:        str\n    description: str\n    fn:          Callable\n    schema:      Dict\n\n    def to_api_format(self) -> Dict:\n        return {\n            \"name\":         self.name,\n            \"description\":  self.description,\n            \"input_schema\": self.schema\n        }\n\nclass AgentMemory:\n    def __init__(self, max_steps: int = 20):\n        self.steps:     List[Dict] = []\n        self.max_steps  = max_steps\n\n    def add_step(self, role: str, content: Any):\n        self.steps.append({\"role\": role, \"content\": content})\n\n    def get_messages(self) -> List[Dict]:\n        return self.steps.copy()\n\n    def __len__(self):\n        return len(self.steps)\n\nclass Agent:\n    \"\"\"\n    A simple but complete agent using Claude with tool use.\n    Implements the ReAct (Reason + Act) loop.\n    \"\"\"\n\n    def __init__(self, tools: List[Tool], system_prompt: str = \"\",\n                 model: str = \"claude-3-5-haiku-20241022\",\n                 max_steps: int = 15, verbose: bool = True):\n        self.client      = anthropic.Anthropic(api_key=os.environ.get(\"ANTHROPIC_API_KEY\"))\n        self.tools       = {t.name: t for t in tools}\n        self.system      = system_prompt or self._default_system()\n        self.model       = model\n        self.max_steps   = max_steps\n        self.verbose     = verbose\n\n    def _default_system(self) -> str:\n        return \"\"\"You are a helpful AI agent. You have access to tools to help complete tasks.\nUse tools when needed. Think step by step. When you have enough information to answer, respond directly.\nIf a task fails, explain what went wrong and what you tried.\"\"\"\n\n    def _execute_tool(self, tool_name: str, tool_input: Dict) -> str:\n        if tool_name not in self.tools:\n            return json.dumps({\"error\": f\"Tool '{tool_name}' not found\"})\n        try:\n            result = self.tools[tool_name].fn(**tool_input)\n            return json.dumps(result) if not isinstance(result, str) else result\n        except Exception as e:\n            return json.dumps({\"error\": str(e)})\n\n    def run(self, goal: str) -> str:\n        memory   = AgentMemory(self.max_steps)\n        api_tools = [t.to_api_format() for t in self.tools.values()]\n\n        memory.add_step(\"user\", goal)\n\n        if self.verbose:\n            print(f\"\\n{'='*60}\")\n            print(f\"Agent Goal: {goal}\")\n            print(f\"{'='*60}\")\n\n        for step in range(self.max_steps):\n            response = self.client.messages.create(\n                model      = self.model,\n                max_tokens = 1024,\n                system     = self.system,\n                tools      = api_tools,\n                messages   = memory.get_messages()\n            )\n\n            if response.stop_reason == \"end_turn\":\n                answer = next(\n                    (b.text for b in response.content if b.type == \"text\"), \"\")\n                if self.verbose:\n                    print(f\"\\n✓ Final Answer: {answer[:200]}\")\n                return answer\n\n            if response.stop_reason == \"tool_use\":\n                memory.add_step(\"assistant\", response.content)\n                tool_results = []\n\n                for block in response.content:\n                    if block.type == \"tool_use\":\n                        if self.verbose:\n                            print(f\"\\n[Step {step+1}] 🔧 {block.name}({json.dumps(block.input)[:80]})\")\n\n                        result = self._execute_tool(block.name, block.input)\n\n                        if self.verbose:\n                            print(f\"         ↳ {result[:120]}\")\n\n                        tool_results.append({\n                            \"type\":        \"tool_result\",\n                            \"tool_use_id\": block.id,\n                            \"content\":     result\n                        })\n\n                memory.add_step(\"user\", tool_results)\n            else:\n                break\n\n        return \"Agent reached maximum steps without completing the task.\"\n\nprint(\"Agent class built. Now we need tools.\")\nphp\nimport math\nimport datetime\nimport random\n\ndef calculator(expression: str) -> Dict:\n    \"\"\"Evaluate a mathematical expression safely.\"\"\"\n    try:\n        allowed = set(\"0123456789+-*/()., \")\n        if not all(c in allowed for c in expression):\n            return {\"error\": \"Invalid characters in expression\"}\n        result = eval(expression, {\"__builtins__\": {}},\n                      {\"sqrt\": math.sqrt, \"pi\": math.pi, \"e\": math.e})\n        return {\"result\": round(float(result), 6), \"expression\": expression}\n    except Exception as e:\n        return {\"error\": str(e)}\n\ndef web_search(query: str, max_results: int = 3) -> Dict:\n    \"\"\"Simulated web search (replace with real API in production).\"\"\"\n    mock_results = {\n        \"transformer architecture\": [\n            {\"title\": \"Attention Is All You Need\", \"snippet\": \"Introduces the transformer architecture using self-attention mechanisms.\", \"url\": \"arxiv.org/abs/1706.03762\"},\n            {\"title\": \"BERT paper\", \"snippet\": \"Bidirectional encoder representations from transformers for NLP.\", \"url\": \"arxiv.org/abs/1810.04805\"},\n        ],\n        \"python list comprehension\": [\n            {\"title\": \"Python Docs\", \"snippet\": \"List comprehensions provide a concise way to create lists: [expr for item in iterable if condition]\", \"url\": \"docs.python.org\"},\n        ],\n        \"climate change\": [\n            {\"title\": \"IPCC Report 2023\", \"snippet\": \"Global surface temperature increased by 1.1°C above pre-industrial levels.\", \"url\": \"ipcc.ch/report/ar6\"},\n            {\"title\": \"NASA Climate\", \"snippet\": \"CO2 levels reached 421 ppm in 2023, highest in 3 million years.\", \"url\": \"climate.nasa.gov\"},\n        ],\n    }\n    query_lower = query.lower()\n    for key, results in mock_results.items():\n        if any(word in query_lower for word in key.split()):\n            return {\"query\": query, \"results\": results[:max_results]}\n    return {\"query\": query, \"results\": [\n        {\"title\": f\"Result for '{query}'\",\n         \"snippet\": f\"Information about {query}. This is a simulated search result.\",\n         \"url\": f\"example.com/search?q={query.replace(' ', '+')}\"}\n    ]}\n\ndef get_current_time(timezone: str = \"UTC\") -> Dict:\n    now = datetime.datetime.utcnow()\n    return {\n        \"datetime\": now.strftime(\"%Y-%m-%d %H:%M:%S\"),\n        \"timezone\": timezone,\n        \"date\":     now.strftime(\"%B %d, %Y\"),\n        \"day\":      now.strftime(\"%A\")\n    }\n\ndef write_file(filename: str, content: str) -> Dict:\n    try:\n        with open(filename, \"w\") as f:\n            f.write(content)\n        return {\"status\": \"success\", \"filename\": filename,\n                \"bytes_written\": len(content)}\n    except Exception as e:\n        return {\"error\": str(e)}\n\ndef read_file(filename: str) -> Dict:\n    try:\n        with open(filename, \"r\") as f:\n            content = f.read()\n        return {\"filename\": filename, \"content\": content,\n                \"lines\": content.count(\"\\n\") + 1}\n    except FileNotFoundError:\n        return {\"error\": f\"File '{filename}' not found\"}\n\ndef python_repl(code: str) -> Dict:\n    \"\"\"Execute Python code and return output.\"\"\"\n    import io, contextlib\n    output = io.StringIO()\n    try:\n        with contextlib.redirect_stdout(output):\n            exec(code, {\"__builtins__\": __builtins__})\n        return {\"output\": output.getvalue(), \"error\": None}\n    except Exception as e:\n        return {\"output\": output.getvalue(), \"error\": str(e)}\n\nTOOLS = [\n    Tool(\n        name=\"calculator\",\n        description=\"Evaluate mathematical expressions. Supports +,-,*,/,(,),sqrt,pi,e\",\n        fn=calculator,\n        schema={\n            \"type\": \"object\",\n            \"properties\": {\n                \"expression\": {\"type\": \"string\", \"description\": \"Math expression to evaluate\"}\n            },\n            \"required\": [\"expression\"]\n        }\n    ),\n    Tool(\n        name=\"web_search\",\n        description=\"Search the web for current information on any topic\",\n        fn=web_search,\n        schema={\n            \"type\": \"object\",\n            \"properties\": {\n                \"query\":       {\"type\": \"string\",  \"description\": \"Search query\"},\n                \"max_results\": {\"type\": \"integer\", \"description\": \"Number of results\", \"default\": 3}\n            },\n            \"required\": [\"query\"]\n        }\n    ),\n    Tool(\n        name=\"get_current_time\",\n        description=\"Get the current date and time\",\n        fn=get_current_time,\n        schema={\n            \"type\": \"object\",\n            \"properties\": {\n                \"timezone\": {\"type\": \"string\", \"description\": \"Timezone name\", \"default\": \"UTC\"}\n            }\n        }\n    ),\n    Tool(\n        name=\"write_file\",\n        description=\"Write text content to a file\",\n        fn=write_file,\n        schema={\n            \"type\": \"object\",\n            \"properties\": {\n                \"filename\": {\"type\": \"string\", \"description\": \"File name to write\"},\n                \"content\":  {\"type\": \"string\", \"description\": \"Content to write\"}\n            },\n            \"required\": [\"filename\", \"content\"]\n        }\n    ),\n    Tool(\n        name=\"read_file\",\n        description=\"Read content from a file\",\n        fn=read_file,\n        schema={\n            \"type\": \"object\",\n            \"properties\": {\n                \"filename\": {\"type\": \"string\", \"description\": \"File name to read\"}\n            },\n            \"required\": [\"filename\"]\n        }\n    ),\n    Tool(\n        name=\"python_repl\",\n        description=\"Execute Python code and return the output\",\n        fn=python_repl,\n        schema={\n            \"type\": \"object\",\n            \"properties\": {\n                \"code\": {\"type\": \"string\", \"description\": \"Python code to execute\"}\n            },\n            \"required\": [\"code\"]\n        }\n    ),\n]\n\nprint(f\"Tool library ready: {len(TOOLS)} tools\")\nfor tool in TOOLS:\n    print(f\"  • {tool.name}: {tool.description[:50]}\")\nagent = Agent(tools=TOOLS, max_steps=10, verbose=True)\n\ntasks = [\n    \"What is 15% of 847 plus the square root of 144?\",\n    \"Search for information about the transformer architecture, then write a 3-sentence summary to a file called 'transformer_summary.txt'\",\n    \"What day of the week is it today? Then calculate how many days until the next New Year's Day.\",\n]\n\nfor task in tasks[:1]:\n    print(f\"\\n{'#'*60}\")\n    result = agent.run(task)\n    print(f\"\\nResult: {result}\")\n```\n\nOutput:\n\n```\n============================================================\nAgent Goal: What is 15% of 847 plus the square root of 144?\n============================================================\n\n[Step 1] 🔧 calculator({\"expression\": \"847 * 0.15 + sqrt(144)\"})\n         ↳ {\"result\": 139.05, \"expression\": \"847 * 0.15 + sqrt(144)\"}\n\n✓ Final Answer: 15% of 847 is 127.05, and the square root of 144 is 12.\nThe sum is 127.05 + 12 = 139.05\nresearch_task = \"\"\"\nSearch for information about BERT and GPT transformer models.\nCompare them by searching for both separately.\nThen write a markdown comparison table to a file called 'llm_comparison.md'\nwith columns: Model, Type, Pretraining Objective, Best Use Case.\n\"\"\"\n\nprint(\"Running multi-step research agent:\")\nresult = agent.run(research_task)\nprint(f\"\\nFinal result: {result}\")\n\ntry:\n    result = read_file(\"llm_comparison.md\")\n    if \"error\" not in result:\n        print(f\"\\nFile created successfully:\")\n        print(result[\"content\"])\nexcept:\n    pass\nprint(\"\\nAgent Failure Modes You Will Encounter:\")\nprint()\n\nfailure_modes = {\n    \"Infinite loops\": {\n        \"description\": \"Agent keeps calling the same tool expecting different results\",\n        \"example\":     \"Search fails → search again → search again → max steps\",\n        \"fix\":         \"Add step counter, detect repeated tool calls, add termination conditions\"\n    },\n    \"Tool hallucination\": {\n        \"description\": \"Agent invents tool parameters that do not match the schema\",\n        \"example\":     \"Calls calculator({'math': '2+2'}) instead of {'expression': '2+2'}\",\n        \"fix\":         \"Validate inputs against schema before execution, strict schema definitions\"\n    },\n    \"Goal drift\": {\n        \"description\": \"Agent pursues a sub-goal and forgets the original goal\",\n        \"example\":     \"Asked to 'find a restaurant', agent spends all steps on dietary research\",\n        \"fix\":         \"Include original goal in every message, add goal-check in system prompt\"\n    },\n    \"Over-tool-use\": {\n        \"description\": \"Agent calls tools for things it already knows\",\n        \"example\":     \"Uses calculator to compute 2+2, searches web for 'what is Python'\",\n        \"fix\":         \"Better system prompt guidance, cost-awareness in tool descriptions\"\n    },\n    \"Cascading errors\": {\n        \"description\": \"Early tool failure propagates through all subsequent steps\",\n        \"example\":     \"File read fails → all downstream processing fails silently\",\n        \"fix\":         \"Error handling in tool functions, check for error keys in results\"\n    },\n    \"Context window overflow\": {\n        \"description\": \"Many tool calls accumulate and exceed context limit\",\n        \"example\":     \"20+ tool calls with large results → API error\",\n        \"fix\":         \"Summarize tool results, limit result size, truncate old history\"\n    },\n}\n\nfor mode, info in failure_modes.items():\n    print(f\"  {mode}:\")\n    print(f\"    What: {info['description']}\")\n    print(f\"    Example: {info['example']}\")\n    print(f\"    Fix: {info['fix']}\")\n    print()\nPLANNING_SYSTEM = \"\"\"You are a planning agent. For complex tasks:\n1. First create a plan as numbered steps\n2. Execute each step using available tools\n3. Verify each step succeeded before proceeding\n4. If a step fails, adjust the plan\n\nAlways show your reasoning before calling tools.\nFormat thoughts as: 'Thought: [your reasoning]'\"\"\"\n\nplanning_agent = Agent(\n    tools         = TOOLS,\n    system_prompt = PLANNING_SYSTEM,\n    max_steps     = 15,\n    verbose       = True\n)\n\nprint(\"Planning agent for complex multi-step task:\")\ncomplex_task = \"\"\"\nCalculate the compound interest on $10,000 at 7% annual rate for 10 years.\nThen generate a Python script that prints a table showing the balance at the end of each year.\nSave the script as 'compound_interest.py'.\n\"\"\"\nresult = planning_agent.run(complex_task)\npython\ndef evaluate_agent(agent, test_cases):\n    \"\"\"Evaluate agent on a set of test cases.\"\"\"\n    results = []\n    for case in test_cases:\n        start = __import__(\"time\").time()\n        try:\n            answer = agent.run(case[\"goal\"])\n            success = case[\"check\"](answer)\n        except Exception as e:\n            answer  = str(e)\n            success = False\n        elapsed = __import__(\"time\").time() - start\n\n        results.append({\n            \"goal\":    case[\"goal\"][:40],\n            \"success\": success,\n            \"time\":    elapsed,\n            \"steps\":   \"N/A\",\n        })\n\n    print(\"\\nAgent Evaluation:\")\n    print(f\"{'Goal':<42} {'Success':>8} {'Time':>8}\")\n    print(\"=\" * 62)\n    for r in results:\n        print(f\"{r['goal']:<42} {'✓' if r['success'] else '✗':>8} {r['time']:>7.1f}s\")\n\n    accuracy = sum(r[\"success\"] for r in results) / len(results)\n    avg_time = sum(r[\"time\"] for r in results) / len(results)\n    print(f\"\\nAccuracy: {accuracy:.0%}  |  Avg time: {avg_time:.1f}s\")\n\ntest_suite = [\n    {\n        \"goal\":  \"Calculate 17 * 23 + 144\",\n        \"check\": lambda a: \"535\" in a\n    },\n    {\n        \"goal\":  \"Search for Python list comprehension syntax\",\n        \"check\": lambda a: any(w in a.lower() for w in [\"for\", \"if\", \"[\", \"list\"])\n    },\n    {\n        \"goal\":  \"Write 'Hello World' to hello.txt then read it back\",\n        \"check\": lambda a: \"hello\" in a.lower() or \"world\" in a.lower()\n    },\n]\n\nevaluate_agent(agent, test_suite)\nprint(\"\\nEssential Agent Reference Links:\")\nprint()\n\nrefs = {\n    \"Papers\": [\n        (\"ReAct: Reason + Act in LLMs\",        \"arxiv.org/abs/2210.03629\"),\n        (\"Toolformer: Teaching LLMs to use tools\", \"arxiv.org/abs/2302.04761\"),\n        (\"AutoGPT: Autonomous agents\",          \"github.com/Significant-Gravitas/AutoGPT\"),\n        (\"AgentBench: Evaluating agents\",       \"arxiv.org/abs/2308.03688\"),\n        (\"Chain-of-Thought Prompting\",          \"arxiv.org/abs/2201.11903\"),\n    ],\n    \"Frameworks\": [\n        (\"LangChain Agents\",        \"python.langchain.com/docs/modules/agents\"),\n        (\"LlamaIndex Agents\",       \"docs.llamaindex.ai/en/stable/use_cases/agents\"),\n        (\"Anthropic Tool Use\",      \"docs.anthropic.com/en/docs/build-with-claude/tool-use\"),\n        (\"OpenAI Assistants API\",   \"platform.openai.com/docs/assistants/overview\"),\n        (\"CrewAI (multi-agent)\",    \"crewai.com\"),\n        (\"AutoGen (Microsoft)\",     \"github.com/microsoft/autogen\"),\n    ],\n    \"Tutorials\": [\n        (\"Build an AI Agent from Scratch\", \"towardsdatascience.com/ai-agents-from-scratch\"),\n        (\"Anthropic Cookbook: Agents\",     \"github.com/anthropics/anthropic-cookbook/tree/main/tool_use\"),\n        (\"DeepLearning.AI Agent Courses\",  \"learn.deeplearning.ai\"),\n        (\"LangGraph (stateful agents)\",    \"langchain-ai.github.io/langgraph\"),\n    ],\n    \"Cheat Sheets\": [\n        (\"Agent design patterns\",           \"lilianweng.github.io/posts/2023-06-23-agent\"),\n        (\"Tool use best practices\",         \"docs.anthropic.com/en/docs/build-with-claude/tool-use\"),\n        (\"Prompt engineering for agents\",   \"learnprompting.org/docs/advanced/agents\"),\n    ],\n}\n\nfor category, links in refs.items():\n    print(f\"  {category}:\")\n    for name, url in links:\n        print(f\"    • {name:<42} {url}\")\n    print()\n```\n\nCreate `agent_practice.py`\n\n.\n\nPart 1: tool library. Implement at least five tools: calculator, web search (mock), time/date, file read/write, and one domain-specific tool of your choice (weather lookup, stock prices, unit converter). Test each tool function directly before plugging into the agent.\n\nPart 2: single-step tasks. Run the agent on five tasks that require exactly one tool call. Verify it calls the right tool with the right arguments each time.\n\nPart 3: multi-step tasks. Run on three tasks requiring 3-5 tool calls each. Examples: \"Search for X, compute a calculation on the result, save to file.\" Track how many steps each task takes. Does the agent complete them correctly?\n\nPart 4: failure injection. Modify one tool to randomly fail 30% of the time. Run a task that depends on that tool 10 times. Does the agent handle failures gracefully? Does it retry? Adjust the system prompt to make it more resilient.\n\nSingle agents work alone. Multi-agent systems divide complex tasks between specialized agents: a researcher agent, a writer agent, a code reviewer agent, each doing what it does best, coordinated by an orchestrator. That is the next post.", "url": "https://wpnews.pro/news/101-ai-agents-when-llms-start-taking-actions", "canonical_source": "https://dev.to/yakhilesh/101-ai-agents-when-llms-start-taking-actions-3pk6", "published_at": "2026-05-29 11:03:32+00:00", "updated_at": "2026-05-29 11:11:38.720700+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "artificial-intelligence", "machine-learning", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/101-ai-agents-when-llms-start-taking-actions", "markdown": "https://wpnews.pro/news/101-ai-agents-when-llms-start-taking-actions.md", "text": "https://wpnews.pro/news/101-ai-agents-when-llms-start-taking-actions.txt", "jsonld": "https://wpnews.pro/news/101-ai-agents-when-llms-start-taking-actions.jsonld"}}