{"slug": "agent-engineering-smolagents", "title": "Agent engineering: smolagents", "summary": "HuggingFace's smolagents framework introduces CodeAgent, a code-writing agent that generates and executes Python in a sandboxed Jupyter kernel environment, replacing traditional tool-calling approaches. The agent uses a ReAct loop to write, run, and observe code, with remote executors providing VM-level isolation and state persistence across steps.", "body_md": "# Agent engineering: smolagents\n\n[Agent Engineering](/series/agent-engineering/)series\n\nThis is the sixth take on the same news reader. Previous versions used tool-calling agents. This time I used [smolagents](https://huggingface.co/docs/smolagents/) from HuggingFace, specifically its `CodeAgent`\n\n. Instead of calling predefined tools, the agent writes Python code and executes it in a sandbox.\n\n## How CodeAgent works\n\nA CodeAgent follows a [ReAct loop](https://huggingface.co/docs/smolagents/conceptual_guides/react): think, act, observe, repeat. At each step, the LLM generates a Python snippet, the framework executes it, and stdout plus the return value feed back as the next observation. To finish, the agent calls `final_answer(value)`\n\n.\n\n``` python\nfrom smolagents import CodeAgent, LiteLLMModel\n\nagent = CodeAgent(\n    tools=[],\n    model=LiteLLMModel(\"anthropic/claude-haiku-4-5\"),\n    executor_type=\"blaxel\",\n    executor_kwargs={\"sandbox_name\": \"news-reader\"},\n    additional_authorized_imports=[\"httpx\", \"bs4\", \"pydantic\"],\n    max_steps=15,\n    verbosity_level=2,\n)\n\nresult = agent.run(task_prompt)\n```\n\nsmolagents has no native Anthropic client. `LiteLLMModel`\n\nwraps the [LiteLLM](https://docs.litellm.ai/) Python library, which is better known as a proxy server but here works as a local library that sends requests directly to `api.anthropic.com`\n\n. The `anthropic/`\n\nprefix in the model ID tells LiteLLM which provider to use.\n\n## Sandboxes: Jupyter repurposed\n\nsmolagents has [several executors](https://huggingface.co/docs/smolagents/reference/python_executors) that determine where the generated code runs.\n\nThe default `LocalPythonExecutor`\n\nis a restricted AST interpreter running in your process. It walks the generated code’s syntax tree and evaluates it node by node, blocking modules like `os`\n\nand `subprocess`\n\n, and functions like `eval`\n\nand `exec`\n\n. Only a small whitelist of safe imports is available. You can expand it with `additional_authorized_imports`\n\n, but the local executor still can’t make HTTP requests or touch the filesystem unless you explicitly opt in.\n\nThe remote executors take a different approach. They spin up an isolated environment and run a Jupyter kernel inside it. smolagents connects to the kernel over WebSocket and sends code the same way a Jupyter notebook sends cells to its kernel. This is the Jupyter ecosystem repurposed. The kernel manages state between steps, so variables defined in one step are available in the next.\n\nWith `executor_type=\"blaxel\"`\n\n, the sandbox is a remote VM that boots in under 25ms from hibernation. The `additional_authorized_imports`\n\nparameter tells smolagents to `pip install`\n\nthose packages into the VM before the first step. The isolation is at the VM level rather than the interpreter level.\n\n## What the agent actually generated\n\nHere’s what the agent generated with `verbosity_level=2`\n\n.\n\n**Step 1** — imports, Pydantic models, and fetch both pages:\n\n``` python\nimport httpx\nfrom bs4 import BeautifulSoup\nfrom pydantic import BaseModel, Field\n\nclass NewsItem(BaseModel):\n    title: str\n    url: str\n    source: str\n    tags: list[str]\n    summary: str\n    discussion_url: str | None = None\n\nclass ScraperResult(BaseModel):\n    items: list[NewsItem]\n    report: str\n\nhn_response = httpx.get(\"https://news.ycombinator.com\", timeout=10)\nhn_html = hn_response.text\nlobsters_response = httpx.get(\"https://lobste.rs\", timeout=10)\nlobsters_html = lobsters_response.text\n```\n\n**Step 2** — parse Hacker News with keyword filtering:\n\n```\nrelevant_keywords = ['python', 'ai', 'ml', 'machine learning', 'tool',\n    'developer', 'architecture', 'software', 'programming', 'database',\n    'framework', 'library', 'algorithm', 'performance', 'deployment',\n    'devops', 'rust', 'golang', 'javascript', 'typescript', 'web',\n    'backend', 'frontend', 'distributed', 'system']\nskip_keywords = ['business', 'funding', 'vc', 'startup funding',\n    'social media', 'twitter', 'elon', 'drama', 'crypto', 'bitcoin',\n    'nft', 'politics', 'hiring', 'jobs']\n\nhn_soup = BeautifulSoup(hn_html, 'html.parser')\nhn_rows = hn_soup.find_all('tr', class_='athing')\n\nfor idx, row in enumerate(hn_rows[:30]):\n    title_cell = row.find('span', class_='titleline')\n    a_tag = title_cell.find('a')\n    title = a_tag.get_text(strip=True)\n    url = a_tag.get('href', '')\n\n    title_lower = title.lower()\n    is_relevant = any(kw in title_lower for kw in relevant_keywords)\n    is_skip = any(kw in title_lower for kw in skip_keywords)\n    if is_relevant and not is_skip:\n        hn_items.append(...)\n```\n\nOutput: “Found 7 relevant items on Hacker News”\n\n**Step 3** — parse Lobsters:\n\n```\nlobsters_soup = BeautifulSoup(lobsters_html, 'html.parser')\nlobsters_rows = lobsters_soup.find_all('li', class_='story')\n\nfor idx, row in enumerate(lobsters_rows[:40]):\n    title_elem = row.find('a', class_='u-url')\n    title = title_elem.get_text(strip=True)\n    url = title_elem.get('href', '')\n\n    tags_container = row.find('ul', class_='tags')\n    # ...\n    comments_link = row.find('a', class_='comments_label')\n    # ...\n```\n\nOutput: “Found 6 relevant items on Lobsters”\n\nThe Lobsters selectors (`li.story`\n\n, `a.u-url`\n\n) were correct for titles and URLs, but the tag and discussion URL selectors returned nothing.\n\n**Steps 4 and 5** generated tags from a keyword map and validated with Pydantic:\n\n```\nvalidated = ScraperResult(**result)\nfinal_answer(validated.model_dump())\n```\n\nOutput: “Validation successful! 13 items validated”\n\n## Structured output through self-validation\n\nUnlike Pydantic AI’s `output_type=MyModel`\n\n, smolagents has no built-in schema enforcement for the final answer. `final_answer()`\n\nis a built-in tool that smolagents injects into every agent. The system prompt tells the agent to call it when the job is done, and the framework stops the loop. The agent can pass any value to it.\n\nI included Pydantic models in the prompt and told the agent to validate before returning. The agent could ignore the instruction, but in practice it doesn’t. If validation fails, the exception becomes the next observation and the agent has remaining steps to fix the data.\n\nFor a harder guarantee, smolagents has `final_answer_checks`\n\n: functions that run on the host before accepting the result. If a check returns False, the agent continues:\n\n``` python\ndef validate_result(answer, **kwargs):\n    try:\n        ScraperResult(**answer)\n        return True\n    except Exception:\n        return False\n\nagent = CodeAgent(..., final_answer_checks=[validate_result])\n```\n\nThis runs on your machine, outside the sandbox. The agent can’t bypass it.\n\n## The tradeoff: code vs. judgment\n\nA tool-calling agent would decide “this article about perceptrons is relevant to AI and Python” because it reads the content. A CodeAgent writes keyword-matching code at generation time.\n\nIn my run, “Trusted Computing Frequently Asked Questions” got tagged as `['ml', 'web', 'rust']`\n\nand “How to fix a laptop that reboots randomly” got tagged as `['ml', 'web']`\n\nbecause the keyword filter matched on substrings. The summaries were just the article titles repeated verbatim.\n\nThe Perplexity team recently published [research](https://research.perplexity.ai/articles/rethinking-search-as-code-generation) arguing that code-generating agents outperform tool-calling agents for search tasks. Their claim is that code expresses complex retrieval logic more naturally than a sequence of tool calls. The news reader task is too simple to test this, but the approach is gaining traction beyond HuggingFace.\n\n## What would make this better\n\nI intentionally kept the implementation naive to see what a zero-tool CodeAgent produces out of the box.\n\n**The agent guessed page structure.** The Hacker News selectors (`tr.athing`\n\n, `span.titleline`\n\n) were correct. The Lobsters selectors for titles (`li.story`\n\n, `a.u-url`\n\n) worked, but the selectors for tags and discussion URLs didn’t match anything. The agent has no way to verify its guesses against the actual HTML. A tool-calling agent with a `web_fetch`\n\ntool would have read the markup and adapted. For a CodeAgent, the fix is to give it a deterministic parsing tool. You’d write a `parse_hn()`\n\nand `parse_lobsters()`\n\ntool with tested selectors, and let the agent call them from its code. The [smolagents docs](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) recommend exactly this: “Whenever possible, logic should be based on deterministic functions rather than agentic decisions.”\n\n**Keyword matching replaced LLM judgment.** The agent wrote a keyword filter instead of evaluating each article. A better architecture would split scraping from judgment: one agent (or deterministic code) fetches and parses the raw data, and a second agent reads the titles and summaries to filter and tag them. smolagents has [ managed_agents](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) for this. You pass one agent as a managed agent to another, and the manager calls it like a function from its generated code.\n\n**Pydantic validation felt bolted on.** I told the agent to validate with Pydantic in the prompt, and it did, but defining models in generated code to validate generated data is circular. If the agent controls both the schema and the data, validation catches typos but not structural problems. A more natural approach for a CodeAgent would be to write results to a structured store (SQLite, for example) where the schema is enforced externally. The sandbox can run `sqlite3`\n\nor any Python library. The agent writes INSERT statements, and the database rejects malformed data.\n\n**Everything runs sequentially.** The agent uses one kernel and parses one site after another. A more natural architecture would run two CodeAgents in parallel, one per site, each with its own sandbox. A third agent would collect their results, filter and summarize them. smolagents supports this with `managed_agents`\n\n, where one agent calls others as functions from its generated code.\n\n## Comparing the approaches\n\n| Tool-calling agents | CodeAgent | |\n|---|---|---|\n| What the LLM produces | Tool name + arguments | Python code |\n| Execution | Framework calls the function | Jupyter kernel runs the code |\n| Available capabilities | Only registered tools | Anything Python can do |\n| Safety model | Tool allowlist + argument validation | Sandbox isolation (VM/container) |\n| Structured output | Schema validation with retry | Self-validation in generated code |\n| Best for | Content requiring LLM judgment | Procedural tasks with clear logic |\n\nThe full project is on [GitHub](https://github.com/imankulov/news-reader).", "url": "https://wpnews.pro/news/agent-engineering-smolagents", "canonical_source": "https://roman.pt/posts/smolagents-version/", "published_at": "2026-06-08 09:00:00+00:00", "updated_at": "2026-06-15 20:41:34.064729+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-tools"], "entities": ["HuggingFace", "smolagents", "CodeAgent", "LiteLLM", "Jupyter", "Blaxel"], "alternates": {"html": "https://wpnews.pro/news/agent-engineering-smolagents", "markdown": "https://wpnews.pro/news/agent-engineering-smolagents.md", "text": "https://wpnews.pro/news/agent-engineering-smolagents.txt", "jsonld": "https://wpnews.pro/news/agent-engineering-smolagents.jsonld"}}