{"slug": "agentic-web-browsing-workflows-with-python-and-playwright", "title": "Agentic Web Browsing Workflows with Python and Playwright", "summary": "A developer has built an agentic web browsing workflow that combines Playwright's headless browser automation with large language models to extract data from dynamic websites without relying on hardcoded CSS selectors. The system operates in a continuous loop, using a sanitized version of the rendered DOM passed to an LLM, which navigates pages, interacts with elements, and returns structured JSON in real time. By replacing brittle, rule-based scraping with a semantic model that uses function-calling capabilities, the approach adapts to site changes and avoids the token limits and hallucinations that occur when feeding raw HTML into LLMs.", "body_md": "Agentic web browsing combines Playwright's headless browser automation with large language models to extract data from dynamic sites without relying on hardcoded CSS selectors. By passing a sanitized version of the rendered DOM to an LLM, the model can navigate pages, interact with elements, and return structured JSON in real time.\n\nModern web applications do not serve static HTML. Content is fetched asynchronously via API calls, rendered on the client side, and obfuscated behind complex CSS modules. Traditional web scraping relies on identifying specific DOM elements using XPath or CSS selectors. When a site deploys a new build, class names change, and standard scrapers break.\n\nLLMs change this paradigm. Instead of defining exactly where data lives, developers can define what data they want. The LLM acts as the routing layer, analyzing the current state of the page and deciding how to extract the target information. This shifts scraping from a brittle, rule-based approach to an adaptable, semantic model.\n\nImplementing this requires a bridge between the LLM's reasoning engine and the actual web page. Playwright provides the execution environment. Python orchestrates the logic.\n\nAn agentic scraper operates in a continuous loop. It observes the environment, plans an action, executes that action, and repeats until the objective is complete.\n\nThe observation phase is critical. LLMs have strict context window limits. Feeding raw HTML from a modern single-page application into an LLM will exhaust token limits and result in hallucinations. The DOM must be minimized.\n\nThe planning phase utilizes the LLM's function-calling capabilities. You define a set of available tools, such as `click_element(id)`\n\n, `type_text(id, text)`\n\n, and `extract_data(json_schema)`\n\n. The model reviews the sanitized DOM and selects the appropriate tool.\n\nThe execution phase runs the selected tool within the Playwright context. If the model chooses to click a button, Python triggers the Playwright click event, waits for the DOM to settle, and restarts the loop.\n\nThe first component is the browser controller. Playwright needs to be configured to handle dynamic content, manage timeouts, and intercept unnecessary network requests to save bandwidth.\n\n``` python title=\"browser_controller.py\" {11-13}\n\nfrom playwright.async_api import async_playwright\n\nasync def setup_browser():\n\nplaywright = await async_playwright().start()\n\nbrowser = await playwright.chromium.launch(headless=True)\n\n```\ncontext = await browser.new_context(\n    viewport={'width': 1280, 'height': 800},\n    user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'\n)\n\n# Block media and tracking to speed up rendering\nawait context.route(\"**/*\", lambda route: \n    route.abort() if route.request.resource_type in [\"image\", \"media\", \"font\"] \n    else route.continue_()\n)\n\npage = await context.new_page()\nreturn playwright, browser, page\n```\n\nasync def fetch_page(page, url):\n\nawait page.goto(url, wait_until=\"networkidle\")\n\nreturn await page.content()\n\n```\nThis controller sets up a clean environment. Blocking images and fonts accelerates page load times, which is essential for real-time extraction tasks. The `networkidle` state ensures that asynchronous JavaScript has finished rendering before we pass the HTML to the next step.\n\n## DOM Sanitization for Context Windows\n\nRaw HTML contains megabytes of data irrelevant to data extraction. Inline styles, SVG paths, tracking scripts, and deep nested divs add token overhead.\n\nWe use Python libraries like BeautifulSoup to strip out noise before sending the content to the LLM. Furthermore, we must map actionable elements to unique IDs so the LLM can reference them in its function calls.\n\n``` python title=\"dom_sanitizer.py\" {16-18}\nfrom bs4 import BeautifulSoup\n\ndef sanitize_html(raw_html):\n    soup = BeautifulSoup(raw_html, \"html.parser\")\n\n    # Remove non-content tags\n    for tag in soup([\"script\", \"style\", \"noscript\", \"svg\", \"img\", \"video\"]):\n        tag.decompose()\n\n    # Remove all attributes except href, and assign interactive IDs\n    element_counter = 0\n    interactive_tags = ['a', 'button', 'input', 'select']\n\n    for tag in soup.find_all(True):\n        tag.attrs = {k: v for k, v in tag.attrs.items() if k in ['href']}\n\n        if tag.name in interactive_tags:\n            tag_id = f\"el_{element_counter}\"\n            tag['data-interact-id'] = tag_id\n            element_counter += 1\n\n    # Remove empty tags and compress whitespace\n    text_content = str(soup)\n    text_content = re.sub(r'\\n\\s*\\n', '\\n', text_content)\n\n    return text_content\n```\n\nThis sanitization dramatically reduces token count. By injecting `data-interact-id`\n\nattributes into buttons and links, we give the LLM a precise coordinate system for interacting with the page.\n\nThe LLM needs a strict schema to interact with our Playwright script. Using OpenAI's API or open-source equivalents, we define the tools available to the model.\n\n``` python title=\"agent_logic.py\" {10-14}\n\nclient = openai.AsyncOpenAI(api_key=\"YOUR_KEY\")\n\nasync def get_agent_decision(sanitized_html, objective):\n\ntools = [\n\n{\n\n\"type\": \"function\",\n\n\"function\": {\n\n\"name\": \"extract_data\",\n\n\"description\": \"Extract structured data when the objective is met\",\n\n\"parameters\": {\n\n\"type\": \"object\",\n\n\"properties\": {\n\n\"items\": {\n\n\"type\": \"array\",\n\n\"items\": {\"type\": \"object\"}\n\n}\n\n},\n\n\"required\": [\"items\"]\n\n}\n\n}\n\n},\n\n{\n\n\"type\": \"function\",\n\n\"function\": {\n\n\"name\": \"click_element\",\n\n\"description\": \"Click an element to load more data or navigate\",\n\n\"parameters\": {\n\n\"type\": \"object\",\n\n\"properties\": {\n\n\"element_id\": {\"type\": \"string\"}\n\n},\n\n\"required\": [\"element_id\"]\n\n}\n\n}\n\n}\n\n]\n\n```\nresponse = await client.chat.completions.create(\n    model=\"gpt-4-turbo-preview\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are a web automation agent. Analyze the HTML and decide the next action.\"},\n        {\"role\": \"user\", \"content\": f\"Objective: {objective}\\n\\nHTML:\\n{sanitized_html}\"}\n    ],\n    tools=tools,\n    tool_choice=\"auto\"\n)\n\nreturn response.choices[0].message\nThe system prompts the model with the objective and the sanitized HTML. The model responds with either a function call to interact with the page or a JSON payload containing the extracted data.\n\n## Executing the Agentic Loop\n\nWith the components built, we tie them together into the main loop. The Python script evaluates the LLM's response, maps the function call back to a Playwright action, and executes it.\n\n``` python title=\"main.py\" {19-21}\n\nasync def run_agent(url, objective):\n    playwright, browser, page = await setup_browser()\n    await page.goto(url, wait_until=\"networkidle\")\n\n    max_steps = 5\n    for step in range(max_steps):\n        raw_html = await page.content()\n        clean_html = sanitize_html(raw_html)\n\n        message = await get_agent_decision(clean_html, objective)\n\n        if not message.tool_calls:\n            print(\"Agent failed to decide.\")\n            break\n\n        tool_call = message.tool_calls[0]\n\n        if tool_call.function.name == \"extract_data\":\n            data = json.loads(tool_call.function.arguments)\n            print(\"Extraction complete:\", json.dumps(data, indent=2))\n            break\n\n        elif tool_call.function.name == \"click_element\":\n            args = json.loads(tool_call.function.arguments)\n            element_id = args[\"element_id\"]\n\n            # Find the element by our injected ID and click it\n            selector = f\"[data-interact-id='{element_id}']\"\n            await page.click(selector)\n            await page.wait_for_load_state(\"networkidle\")\n\n    await browser.close()\n    await playwright.stop()\n\n# asyncio.run(run_agent(\"https://example.com/catalog\", \"Extract product names and prices\"))\n```\n\nThis architecture handles complex scenarios. If data is hidden behind a \"Load More\" button or requires expanding a dropdown, the agent can parse the layout, click the specific element, wait for the new HTML to render, and proceed with extraction.\n\nRunning a local Playwright script works for small tasks. Scaling agentic web browsing presents significant infrastructure challenges.\n\nE-commerce sites, travel aggregators, and social platforms deploy aggressive fingerprinting and behavioral analysis to detect automated browsers. Running raw Playwright instances from cloud servers will result in immediate IP bans and CAPTCHA challenges.\n\nInstead of managing proxy rotations, header spoofing, and browser fingerprints manually, developers route traffic through managed infrastructure. AlterLab handles the complexity of headless browser execution at scale.\n\nBy passing requests through a [smart rendering API](https://alterlab.io/smart-rendering-api), the anti-bot bypass logic is abstracted away. The API handles the browser lifecycle, solves required challenges, and returns the clean HTML payload for your LLM pipeline.\n\nHere is how you execute a request using the [Python SDK](https://alterlab.io/web-scraping-api-python).\n\n``` python title=\"alterlab_scraper.py\" {4-6}\n\nclient = alterlab.Client(\"YOUR_API_KEY\")\n\nresponse = client.scrape(\n\n\"[https://example.com/catalog](https://example.com/catalog)\",\n\nrender_js=True,\n\nwait_for=\"networkidle\"\n\n)\n\nprint(response.text)\n\n```\nThe equivalent operation using cURL is straightforward. This is useful for testing or integrating into non-Python environments.\n\n``` bash title=\"Terminal\"\ncurl -X POST https://api.alterlab.io/v1/scrape \\\n  -H \"X-API-Key: YOUR_API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"url\": \"https://example.com/catalog\",\n    \"render_js\": true,\n    \"wait_for\": \"networkidle\"\n  }'\n```\n\nBoth examples return the fully rendered HTML payload, ready for processing by your agentic pipeline. For deeper integration patterns, consult the [API docs](https://alterlab.io/docs).\n\nAs your pipelines grow more sophisticated, maintaining state across the agentic loop becomes vital. The standard loop processes single pages. Complex extraction might require logging into a portal, navigating through a multi-step form, and polling for asynchronous job completions.\n\nTo manage this, persist the Playwright browser context between runs. Store cookies and local storage tokens locally. When the agent restarts, inject the stored state to bypass login walls.\n\nFurthermore, streaming the LLM responses can reduce latency. Instead of waiting for the entire JSON payload to generate, stream the tokens, parse the function calls on the fly, and begin executing Playwright actions milliseconds after the model makes a decision. This optimization drastically cuts down the total execution time for deeply nested scraping tasks.\n\nAgentic web scraping replaces brittle CSS selectors with semantic, resilient data extraction. By pairing Playwright's browser automation with Python and function-calling LLMs, engineers can build pipelines that adapt to UI changes automatically. While scaling these systems requires managing complex browser fingerprints, offloading infrastructure concerns allows teams to focus entirely on writing robust agent logic and maximizing data quality.", "url": "https://wpnews.pro/news/agentic-web-browsing-workflows-with-python-and-playwright", "canonical_source": "https://dev.to/alterlab/agentic-web-browsing-workflows-with-python-and-playwright-3nd9", "published_at": "2026-05-30 01:02:34+00:00", "updated_at": "2026-05-30 01:11:32.936421+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-tools", "artificial-intelligence", "natural-language-processing"], "entities": ["Playwright", "Python", "LLM"], "alternates": {"html": "https://wpnews.pro/news/agentic-web-browsing-workflows-with-python-and-playwright", "markdown": "https://wpnews.pro/news/agentic-web-browsing-workflows-with-python-and-playwright.md", "text": "https://wpnews.pro/news/agentic-web-browsing-workflows-with-python-and-playwright.txt", "jsonld": "https://wpnews.pro/news/agentic-web-browsing-workflows-with-python-and-playwright.jsonld"}}