{"slug": "demystifying-ai-agents-building-an-agentic-pipeline-from-scratch-in-pure-python", "title": "Demystifying AI Agents: Building an Agentic Pipeline From Scratch in Pure Python", "summary": "To build an AI agent pipeline from scratch using pure Python, stripping away the abstractions of popular frameworks like LangChain and CrewAI. It describes the core mechanics of an agent as a continuous execution cycle of thinking, acting, and observing, rather than a single LLM transaction. The tutorial provides a modular code structure with components like a configuration file, HTTP client, memory manager, and orchestration engine to demonstrate the underlying runtime architecture.", "body_md": "Most AI demos look impressive until you ask a simple question: **What is actually happening under the hood?**\n\nFrameworks like LangChain, CrewAI, and Microsoft AutoGen make it incredibly easy to spin up an “AI agent” in a few lines of code. But abstractions come with a cost.\n\nMany developers can build agents using frameworks without fully understanding the runtime architecture behind them. But what happens when something goes wrong and, because of the layers of abstraction, you don’t know exactly how to fix it? Or when you can’t even use those libraries and frameworks in your workplace and need to build an agentic application from scratch…\n\nAt their core, most agent frameworks are built around surprisingly simple primitives:\n\n- Prompt orchestration\n- Stateful memory\n- Tool execution\n- Control loops\n- Structured outputs\n\nThis week I was talking with a friend who wanted to understand how AI agents actually work under the hood. During that conversation, I realized something: **most tutorials make AI agents feel far more mysterious than they really are.**\n\nFrameworks are great for moving fast, but they also hide many of the core mechanics behind layers of abstractions. You import a library, initialize an “agent,” attach a tool, and suddenly everything looks autonomous and intelligent. But underneath those abstractions, most agent systems are built on a surprisingly small set of concepts:\n\n- Prompts\n- Memory\n- Tool execution\n- Structured outputs\n- Control loops\n\nSo I decided to write the article I wish I had found when I first started exploring agentic systems. No heavy frameworks. No orchestration libraries. No hidden runtime magic. Just the core ideas, built step-by-step from scratch in pure Python.\n\nIn this article, we will strip away the abstractions and build a production-inspired agentic pipeline entirely from scratch using:\n\n- Pure Python\n- The standard library only\n- Native HTTP requests\n- No SDKs\n- No orchestration frameworks\n\nBy the end, you will understand the core mechanics behind modern AI agents and why most frameworks are essentially layered convenience abstractions over a deterministic execution loop.\n\n## What Is an Agentic Pipeline?\n\nA standard LLM interaction is usually a single-shot transaction:\n\n```\nUser Prompt ──> Model Response\n```\n\nThe model receives context once and generates a static response. An agent, however, behaves differently. Instead of generating a single response, it operates inside a continuous execution cycle:\n\n```\n       ┌───────────────────────────────────────┐\n       │                                       │\n       ▼                                       │\n[ THINK ] ───> (Decision) ───> [ ACT ] ───> [ OBSERVE ]\n                               (Tool Call)   (Tool Result)\n```\n\n### Think\n\nThe model evaluates the user objective, available tools, prior observations, and current memory state. It then decides what to do next.\n\n### Act\n\nThe agent executes an action. This could be calling a function, querying a database, searching the web, reading files, or returning a final answer.\n\n### Observe\n\nThe system captures the result of the action and feeds it back into the context window. The cycle repeats until the objective is complete.\n\n## A Helpful Mental Model\n\nThink of an agent like a developer debugging a production issue:\n\n```\nObserve error logs\n        │\n        ▼\nForm a hypothesis\n        │\n        ▼\n  Run a command\n        │\n        ▼\n Inspect output\n        │\n        ▼\n    Repeat\n```\n\nThat iterative feedback loop is exactly how agentic systems operate.\n\n## Project Structure\n\nWe will organize the codebase into small, focused modules.\n\n```\nagentic-pipeline/\n├── config.json       # Runtime configuration\n├── llm_client.py     # Low-level HTTP client\n├── memory.py         # Context/state manager\n├── agent.py          # Agent orchestration engine\n└── main.py           # Runtime execution loop\n```\n\nThis separation mirrors how production systems are commonly structured.\n\n## Step 1 — Configuration Management\n\nAvoid hardcoding runtime variables directly in code. For this demo we’re going to Create a `config.json`\n\nfile just for demonstration purposes:\n\n```\n{\n  \"llm\": {\n    \"provider\": \"openai\",\n    \"model\": \"gpt-4o\",\n    \"api_key\": \"sk-your-api-key\",\n    \"temperature\": 0.2,\n    \"max_tokens\": 1024\n  }\n}\n```\n\n⚠️\n\nNote:In production systems, credentials should come from environment variables or a secrets manager rather than static configuration files.\n\n## Step 2 — Building the Infrastructure Layer\n\nMost SDKs hide the reality that every LLM interaction is just an HTTP request. Underneath the abstraction, the process is straightforward:\n\n```\nSerialize payload ──> Send HTTPS POST request ──> Receive JSON response ──> Parse output\n```\n\nLet’s implement that manually in `llm_client.py`\n\n.\n\n``` python\nimport json\nimport urllib.request\nimport urllib.error\nfrom typing import Dict, List\n\nclass LLMClient:\n    def __init__(self, config: Dict):\n        self.config = config[\"llm\"]\n        self.api_key = self.config[\"api_key\"]\n\n    def chat_completion(\n        self,\n        messages: List[Dict],\n        temperature: float = None\n    ) -> str:\n        payload = {\n            \"model\": self.config[\"model\"],\n            \"messages\": messages,\n            \"temperature\": temperature or self.config.get(\"temperature\", 0.2),\n            \"max_tokens\": self.config.get(\"max_tokens\", 1024)\n        }\n\n        data = json.dumps(payload).encode(\"utf-8\")\n\n        req = urllib.request.Request(\n     \"https://api.openai.com/v1/chat/completions\",\n            data=data,\n            method=\"POST\"\n        )\n\n        req.add_header(\"Content-Type\", \"application/json\")\n        req.add_header(\"Authorization\", f\"Bearer {self.api_key}\")\n\n        try:\n            with urllib.request.urlopen(req) as response:\n                result = json.loads(response.read().decode())\n                return result[\"choices\"][0][\"message\"][\"content\"].strip()\n        except urllib.error.HTTPError as e:\n            error_body = e.read().decode()\n            raise Exception(f\"LLM API error: {e.code} - {error_body}\")\n```\n\nTo understand what `LLMClient`\n\nis doing here, it helps to think of it like an old-school telegraph operator. This layer has no concept of reasoning, planning, or executing tools. It doesn't even manage memory. Its only job is to package up a stack of text, send it down the wire to the model, and hand you back the raw response. It moves the messages back and forth reliably without needing to understand a single word written inside them.\n\n## Step 3 — Managing Agent Memory\n\nLLMs are stateless. They do not remember previous interactions unless the entire history is resent with every request. As the execution loop progresses, the context window continuously grows. We therefore need a lightweight memory manager in `memory.py`\n\n.\n\n``` python\nfrom typing import List, Dict\n\nclass AgentMemory:\n    def __init__(self, max_messages: int = 20):\n        self.messages: List[Dict] = []\n        self.max_messages = max_messages\n\n    def add(self, role: str, content: str):\n        self.messages.append({\n            \"role\": role,\n            \"content\": content\n        })\n\n        if len(self.messages) > self.max_messages:\n            # Preserve system prompt\n            system_prompt = self.messages[0]\n\n            # Slide conversation window\n            active_history = self.messages[1:]\n            self.messages = (\n                [system_prompt] + \n                active_history[-(self.max_messages - 1):]\n            )\n\n    def get_messages(self) -> List[Dict]:\n        return self.messages.copy()\n\n    def clear(self):\n        self.messages.clear()\n```\n\nIf the LLM client is our telegraph operator, you can picture this memory manager like a detective's notebook. As the agent investigates a task, every tiny detail gets written down: the original user request, internal reasoning, tool choices, and the clues discovered along the way. Because the notebook can't hold infinite pages, the detective eventually has to archive old details while keeping the core investigation context front and center. That sliding window logic is exactly how we keep the context manageable.\n\n## Step 4 — Building the Agent Engine\n\nThis is where the orchestration logic lives. The agent must understand available tools, decide when to use them, parse structured outputs, execute functions, and feed observations back into memory. Let's write `agent.py`\n\n:\n\n``` python\nfrom llm_client import LLMClient\nfrom memory import AgentMemory\nfrom typing import Dict, Callable\nimport json\n\nclass Agent:\n    def __init__(self, system_prompt: str, config_path: str = \"config.json\"):\n        with open(config_path) as f:\n            self.config = json.load(f)\n        self.llm = LLMClient(self.config)\n        self.memory = AgentMemory()\n        self.system_prompt = system_prompt\n        self.tools: Dict[str, dict] = {}\n\n        self.memory.add(\"system\", system_prompt)\n\n    def register_tool(self, name: str, func: Callable, description: str):\n        self.tools[name] = {\n            \"func\": func,\n            \"description\": description\n        }\n\n    def _get_tool_descriptions(self) -> str:\n        if not self.tools:\n            return \"No tools available.\"\n        return \"\\n\".join([\n            f\"- {name}: {info['description']}\"\n            for name, info in self.tools.items()\n        ])\n\n    def think(self, user_input: str) -> str:\n        self.memory.add(\"user\", user_input)\n        messages = self.memory.get_messages()\n        tool_info = self._get_tool_descriptions()\n\n        if self.tools:\n            messages = messages.copy()\n            enhanced_content = (\n                f\"{user_input}\\n\\n\"\n                f\"AVAILABLE TOOLS:\\n\"\n                f\"{tool_info}\\n\\n\"\n                f\"If you need a tool, respond ONLY with JSON:\\n\"\n                f'{{\"tool\":\"tool_name\",\"args\":{{}}}}\\n\\n'\n                f\"If the task is complete, respond naturally and include 'FINAL ANSWER'.\"\n            )\n            messages[-1][\"content\"] = enhanced_content\n\n        response = self.llm.chat_completion(messages)\n        self.memory.add(\"assistant\", response)\n        return response\n\n    def act(self, response: str):\n        if \"{\" in response and \"}\" in response:\n            try:\n                start = response.find(\"{\")\n                end = response.rfind(\"}\") + 1\n                tool_json = json.loads(response[start:end])\n\n                tool_name = tool_json.get(\"tool\")\n                args = tool_json.get(\"args\", {})\n\n                if tool_name in self.tools:\n                    result = self.tools[tool_name].get(“func”)(**args)\n                    self.memory.add(\n                        \"system\", \n                        f\"Observation from '{tool_name}': {result}\"\n                    )\n                    return result\n            except Exception as e:\n                error_msg = f\"Tool execution failed: {str(e)}\"\n                self.memory.add(\"system\", error_msg)\n                return error_msg\n        return None\n```\n\nThis structural handoff brings up one of the most misunderstood parts of modern AI agents: **the model does not execute your Python functions directly.**\n\nInstead, you are providing plain text descriptions of your local code inside the prompt. When the model reads these descriptions and decides it needs help, it simply formats its text output into a raw JSON block specifying a tool name and parameters. Your host application then catches that JSON, reads it, runs the native Python code locally, and passes the results back into the text history. The LLM itself remains entirely isolated, your local application serves as the actual execution environment.\n\n## Step 5 — The Runtime Control Loop\n\nWithout a runtime loop, the agent cannot perform multi-step reasoning. The host application must continuously drive execution forward. Let's look at `main.py`\n\n:\n\n``` php\nfrom agent import Agent\nimport time\n\ndef web_search(query: str) -> str:\n    print(f\"🔍 Searching index for: '{query}'\")\n    time.sleep(1)\n    if \"agentic ai\" in query.lower():\n        return (\n            \"Found: Modern agentic systems are moving away from rigid chains \"\n            \"toward lightweight control loops and modular tools.\"\n        )\n    return (\n        \"Found: Building agents from scratch reveals implementation details \"\n        \"often hidden by frameworks.\"\n    )\n\nif __name__ == \"__main__\":\n    system_prompt = (\n        \"You are an autonomous operations assistant. \"\n        \"Reason step-by-step. \"\n        \"Use tools when necessary. \"\n        \"When the task is fully complete, include the phrase FINAL ANSWER.\"\n    )\n\n    agent = Agent(system_prompt)\n    agent.register_tool(\n        name=\"search\",\n        func=web_search,\n        description=\"Queries an index database. Input schema: {'query': str}\"\n    )\n\n    task = \"Research trends in agentic AI and explain why building from scratch is valuable.\"\n    print(f\"🎯 Objective: {task}\")\n\n    max_steps = 5\n    for step in range(max_steps):\n        print(f\"\\n[Cycle {step + 1}]\")\n        prompt = task if step == 0 else \"Analyze previous observations and continue.\"\n\n        response = agent.think(prompt)\n        print(f\"\\n🤖 Agent:\\n{response}\")\n\n        tool_output = agent.act(response)\n        if tool_output:\n            print(f\"\\n🛠 Observation:\\n{tool_output}\")\n\n        if \"final answer\" in response.lower():\n            print(\"\\n✅ Objective completed.\")\n            break\n```\n\n## Tracing the Runtime Execution\n\nHere is a look at what happens internally during execution over two separate cycles:\n\n### Cycle 1\n\n-\n**Think:** The model receives the task, tool descriptions, and the initial system memory state. It realizes it lacks direct information about current trends. -\n**Act:** The model emits structured JSON:\n\n```\n{\n  \"tool\": \"search\",\n  \"args\": {\n    \"query\": \"latest trends in agentic AI\"\n  }\n}\n```\n\nThe runtime parses this block and executes the local Python `web_search`\n\nfunction.\n\n-\n**Observe:** The tool output gets appended back into memory. The model now has additional context to continue reasoning.\n\n### Cycle 2\n\nThe model reviews the original objective, prior observations, and tool outputs. It synthesizes a complete response and emits:\n\n```\nplaintext\nFINAL ANSWER\n```\n\nThe control loop detects this completion keyword and exits gracefully.\n\n## What You Actually Built\n\nUnderneath all the abstractions, you implemented a fully working pipeline:\n\n- Stateful memory\n- Tool registration\n- Structured tool calling\n- Runtime orchestration\n- Multi-step execution\n- Context management\n- Deterministic control flow\n\nThat is the foundation of nearly every modern agent framework.\n\n## Production Considerations\n\nThis implementation is intentionally minimal. Real production systems typically add:\n\n| Domain | Operational Mechanics |\n|---|---|\nResilience & Tracking |\nRetry policies, Token accounting, Observability & tracing |\nData & Run Management |\nParallel tool execution, Sandboxed runtimes, Rate limiting |\nArchitecture Scaling |\nDistributed orchestration, Long-term memory persistence layers |\nSecurity & Safety |\nGuardrails and validation, Human approval checkpoints |\n\nFrameworks become valuable once these operational concerns grow large enough. But understanding the core loop first changes how you design AI systems.\n\n## Final Thoughts\n\nAI agents can appear magical when hidden behind high-level abstractions. But once you strip away the layers, most systems reduce to a small set of deterministic building blocks: prompts, memory, tools, parsing, and loops.\n\nUnderstanding those primitives gives you far more architectural control than blindly composing frameworks. Before introducing another dependency into your stack, it is worth asking:\n\n“Do I actually need a framework here, or do I just need a well-designed control loop?”\n\nIf you can answer that question confidently, you already understand more about agentic systems than most developers using them today.", "url": "https://wpnews.pro/news/demystifying-ai-agents-building-an-agentic-pipeline-from-scratch-in-pure-python", "canonical_source": "https://dev.to/rafaeltedesco/demystifying-ai-agents-building-an-agentic-pipeline-from-scratch-in-pure-python-3iio", "published_at": "2026-05-21 02:58:20+00:00", "updated_at": "2026-05-21 03:34:19.196559+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "developer-tools", "open-source"], "entities": ["LangChain", "CrewAI", "Microsoft AutoGen", "Python"], "alternates": {"html": "https://wpnews.pro/news/demystifying-ai-agents-building-an-agentic-pipeline-from-scratch-in-pure-python", "markdown": "https://wpnews.pro/news/demystifying-ai-agents-building-an-agentic-pipeline-from-scratch-in-pure-python.md", "text": "https://wpnews.pro/news/demystifying-ai-agents-building-an-agentic-pipeline-from-scratch-in-pure-python.txt", "jsonld": "https://wpnews.pro/news/demystifying-ai-agents-building-an-agentic-pipeline-from-scratch-in-pure-python.jsonld"}}