{"slug": "how-to-use-pyppeteer-stealth-for-web-scraping", "title": "How to use pyppeteer_stealth for web scraping", "summary": "Pyppeteer_stealth patches automation signals in Pyppeteer to evade anti-bot detection systems. The plugin hides the WebDriver flag, replaces the HeadlessChrome user agent string, and populates empty navigator properties that expose headless browsers. Developers can apply the stealth patches by importing the library and calling it on a page instance before navigating to any target website.", "body_md": "Pyppeteer on its own is straightforward to detect. It leaks the WebDriver flag, uses the HeadlessChrome user agent string, returns an empty navigator.vendor, and fails other fingerprinting checks that anti-bot systems run on every request. Any serious bot detection will catch a bare Pyppeteer session before your scraping logic even runs.\n\n[pyppeteer_stealth](https://pypi.org/project/pyppeteer-stealth/) is a plugin that patches those leaks. It applies a set of evasion techniques directly to the Pyppeteer page instance to make headless Chrome look more like a real browser session.\n\nIn this tutorial you will learn what pyppeteer_stealth patches, how to use it, how to fix the most common setup issue you will run into, and where its limits are against modern anti-bot systems.\n\n## What is pyppeteer_stealth?\n\npyppeteer_stealth is the Python implementation of the Puppeteer Stealth plugin, built as an add-on for Pyppeteer, which is Python's unofficial port of Puppeteer. Where base Pyppeteer exposes clear automation signals, pyppeteer_stealth patches the most obvious ones.\n\nHere is what it patches:\n\n**User Agent.** Changes the`HeadlessChrome`\n\nflag in the user agent to a standard`Chrome`\n\nstring so the browser does not announce it is running headless.**WebDriver.** Sets`navigator.webdriver`\n\nto`false`\n\n. This is the most commonly checked automation flag and one of the first things anti-bot systems look for.**Chrome Runtime.** Modifies the Chrome runtime object to make headless Chrome look like it is running in standard GUI mode.**Hardware Concurrency.** Overrides the CPU core count to match a realistic machine rather than the default value headless environments often return.**Plugins.** Populates`navigator.plugins`\n\nwith real browser plugin data. An empty plugin list is a strong automation signal.**Vendor.** Overrides`navigator.vendor`\n\nwith a real vendor string. Headless Chrome returns an empty string here by default.**WebGL.** Spoofs GPU properties to return realistic hardware values rather than the generic software renderer headless environments use.**Media Codecs.** Replaces bot-like codec values with realistic MIME types that match a real browser installation.\n\n## How to use pyppeteer_stealth\n\n### Install the libraries\n\n```\npip3 install pyppeteer_stealth pyppeteer\n```\n\n### Step 1: Run base Pyppeteer as a baseline\n\nBefore adding the stealth plugin, run a fingerprinting test to see what base Pyppeteer exposes. This gives you a clear before-and-after comparison.\n\n``` python\n# pip3 install pyppeteer\nimport asyncio\nfrom pyppeteer import launch\n\nasync def scraper():\n    browser = await launch(headless=True)\n    page = await browser.newPage()\n\n    await page.goto(\"https://bot.sannysoft.com/\")\n    await page.screenshot({\"path\": \"baseline.png\"})\n\n    await browser.close()\n\nasyncio.run(scraper())\n```\n\nThe screenshot shows multiple red flags on the fingerprinting test. WebDriver exposed, HeadlessChrome in the user agent, empty plugins list. Base Pyppeteer fails a significant portion of the checks that anti-bot systems run.\n\n### Step 2: Add pyppeteer_stealth\n\nImport the `stealth`\n\nfunction and call it on the page instance after creating it but before navigating anywhere. The plugin patches the page context before any page code runs:\n\n``` python\n# pip3 install pyppeteer-stealth pyppeteer\nimport asyncio\nfrom pyppeteer import launch\nfrom pyppeteer_stealth import stealth\n\nasync def scraper():\n    browser = await launch(headless=True)\n    page = await browser.newPage()\n\n    # apply stealth patches to the page before navigating\n    await stealth(page)\n\n    await page.goto(\"https://bot.sannysoft.com/\")\n    await page.screenshot({\"path\": \"stealth.png\"})\n\n    await browser.close()\n\nasyncio.run(scraper())\n```\n\nWith the plugin applied, the fingerprinting test passes. The WebDriver flag is hidden, the user agent looks normal, plugins are populated, and the other patched properties return realistic values.\n\nTwo lines of change and the surface-level fingerprint looks clean.\n\n### Step 3: Scrape real data\n\nHere is a complete example that uses pyppeteer_stealth to extract product data from an e-commerce page:\n\n``` python\nimport asyncio\nfrom pyppeteer import launch\nfrom pyppeteer_stealth import stealth\n\nasync def scrape_products(url: str) -> list:\n    browser = await launch(headless=True)\n    page = await browser.newPage()\n\n    await stealth(page)\n    await page.goto(url, {\"waitUntil\": \"networkidle0\"})\n\n    products = await page.querySelectorAll(\".product\")\n    results = []\n\n    for product in products:\n        name = await product.querySelectorEval(\n            \".product-name\", \"el => el.innerText\"\n        )\n        price = await product.querySelectorEval(\n            \".price\", \"el => el.innerText\"\n        )\n        results.append({\"name\": name, \"price\": price})\n\n    await browser.close()\n    return results\n\ndata = asyncio.run(\n    scrape_products(\"https://www.scrapingcourse.com/ecommerce/\")\n)\nprint(data)\n# Output\n[\n    {\"name\": \"Abominable Hoodie\", \"price\": \"$69.00\"},\n    {\"name\": \"Adrienne Trek Jacket\", \"price\": \"$57.00\"},\n    # ...\n    {\"name\": \"Artemis Running Short\", \"price\": \"$45.00\"},\n]\n```\n\nThat works on an open page. Now test it against something with actual protection.\n\n## The common setup issue: Chromium not found\n\nWhen you run Pyppeteer for the first time, you may hit this error:\n\n```\nOSError: Chromium downloadable not found at\nhttps://storage.googleapis.com/chromium-browser-snapshots/Win_x64/1181205/chrome-win.zip\n```\n\nThis happens because Pyppeteer targets a specific Chromium revision that is no longer available at that URL. The fix is to override the revision number before Pyppeteer tries to download it.\n\nFind the `chromium_downloader.py`\n\nfile in your Pyppeteer installation. If you are using a virtual environment, it is at `venv/Lib/site-packages/pyppeteer/chromium_downloader.py`\n\n. Add this line before the `REVISION`\n\nvariable:\n\n``` python\nimport os\n\nos.environ[\"PYPPETEER_CHROMIUM_REVISION\"] = \"1181217\"\n\nREVISION = os.environ.get(\"PYPPETEER_CHROMIUM_REVISION\", __chromium_revision__)\n```\n\nYou can also set the environment variable before running your script without touching the library files:\n\n```\n# Linux / macOS\nexport PYPPETEER_CHROMIUM_REVISION=1181217\n\n# Windows\nset PYPPETEER_CHROMIUM_REVISION=1181217\n```\n\n## The limitations of pyppeteer_stealth\n\npyppeteer_stealth passes fingerprinting tests. That is a meaningful improvement over base Pyppeteer. The limits become clear when you point it at a page with real anti-bot protection.\n\n### 1. It fails against modern anti-bot systems\n\n``` python\nimport asyncio\nfrom pyppeteer import launch\nfrom pyppeteer_stealth import stealth\n\nasync def scraper():\n    browser = await launch(headless=True)\n    page = await browser.newPage()\n\n    await stealth(page)\n    await page.goto(\"https://www.scrapingcourse.com/antibot-challenge\")\n    await page.screenshot({\"path\": \"blocked.png\"})\n\n    await browser.close()\n\nasyncio.run(scraper())\n```\n\nThe screenshot shows the block page. pyppeteer_stealth patches surface fingerprints but it does not address the deeper JavaScript challenges, timing analysis, and behavioral checks that modern systems like Cloudflare run in the background.\n\n### 2. It has not been updated since 2021\n\nThis is the most significant limitation. Anti-bot systems update frequently. A stealth library that has not changed in years is working against detection techniques from years ago. Many anti-bot vendors have specifically added detection for pyppeteer_stealth's patches since 2021 because its bypass mechanisms are public and well-documented in the open-source code.\n\n### 3. Navigation patterns are still predictable\n\nEven with fingerprints patched, the way Pyppeteer navigates pages, the timing between requests, the absence of natural reading pauses, and other behavioral signals can still look automated to systems that analyze traffic patterns rather than just browser properties.\n\n### 4. No proxy infrastructure\n\npyppeteer_stealth has no built-in proxy rotation or geo-targeting. IP bans, rate limits, and geo-restrictions are entirely your problem to handle separately.\n\n## Going beyond pyppeteer_stealth\n\nWhen the plugin is not enough, you have two paths. You can add proxy rotation, switch to a more recently maintained stealth library, and keep tuning the setup manually. Or you can move the anti-bot handling out of your code entirely.\n\nSpidra handles the full stack at the [API level](https://spidra.io/products/spidra-api). Every request runs through a real browser with residential proxy rotation across 50 countries, CAPTCHA solving, and fingerprinting maintained against current detection techniques. It also replaces the HTML parsing step entirely: instead of returning raw HTML you still need to parse, it extracts exactly what you describe and returns clean structured JSON.\n\nHere is the same anti-bot challenge page that blocked pyppeteer_stealth, using [Spidra's Python SDK](https://docs.spidra.io/sdks/python):\n\n```\npip install spidra\npython\nfrom spidra import SpidraClient, ScrapeParams, ScrapeUrl\nimport os\n\nspidra = SpidraClient(api_key=os.environ[\"SPIDRA_API_KEY\"])\n\njob = spidra.scrape.run_sync(ScrapeParams(\n    urls=[ScrapeUrl(url=\"https://www.scrapingcourse.com/antibot-challenge/\")],\n    prompt=\"Extract the main heading\",\n    use_proxy=True,\n    proxy_country=\"us\",\n))\n\nprint(job.result.content)\n# { \"heading\": \"You bypassed the Antibot challenge! :D\" }\n```\n\nNo browser to launch. No plugin to apply. No revision number to fix. The same request works on open pages and protected ones without any changes.\n\nHere is the same e-commerce scraping task without any selectors or parsing:\n\n```\njob = spidra.scrape.run_sync(ScrapeParams(\n    urls=[ScrapeUrl(url=\"https://www.scrapingcourse.com/ecommerce/\")],\n    prompt=\"Extract all product names and prices\",\n    output=\"json\",\n))\n\nprint(job.result.content)\n[\n    {\"name\": \"Abominable Hoodie\", \"price\": \"$69.00\"},\n    {\"name\": \"Adrienne Trek Jacket\", \"price\": \"$57.00\"},\n    {\"name\": \"Artemis Running Short\", \"price\": \"$45.00\"}\n]\n```\n\nIf you want a guaranteed output shape for downstream pipelines, add a schema:\n\n```\njob = spidra.scrape.run_sync(ScrapeParams(\n    urls=[ScrapeUrl(url=\"https://www.scrapingcourse.com/ecommerce/\")],\n    prompt=\"Extract all products\",\n    output=\"json\",\n    schema={\n        \"type\": \"array\",\n        \"items\": {\n            \"type\": \"object\",\n            \"required\": [\"name\", \"price\"],\n            \"properties\": {\n                \"name\":  {\"type\": \"string\"},\n                \"price\": {\"type\": \"string\"},\n                \"image\": {\"type\": [\"string\", \"null\"]},\n            }\n        }\n    }\n))\n```\n\nRequired fields always appear in every record, as `null`\n\nif the page does not have that value.\n\n## pyppeteer_stealth vs. Spidra\n\n| pyppeteer_stealth | Spidra | |\n|---|---|---|\n| Fingerprint patching | Yes, 8 properties patched | Handled at infrastructure level |\n| Last updated | 2021 | Actively maintained |\n| Cloudflare bypass | Fails on JS challenges | Built in, automatic |\n| DataDome / PerimeterX | Not reliable | Built in, automatic |\n| Proxy rotation | Not included | Built in, 50 countries |\n| Structured output | Raw HTML, you parse it | AI extraction, optional schema |\n| Chromium setup issues | Yes, revision fix required | Not applicable |\n| Maintenance as anti-bots evolve | Manual | Handled by Spidra |\n| Language | Python | Python, Node.js, Go, PHP, Ruby, and 5 more |\n| Best for | Light scraping, basic fingerprint patching | Protected sites, production pipelines |\n\n## Conclusion\n\npyppeteer_stealth does what it says. It patches the most visible automation signals in Pyppeteer and a fingerprinting test looks much cleaner with it applied. For light scraping on sites without serious bot protection, it is a simple and low-effort improvement over base Pyppeteer.\n\nThe limitation is age. It has not been updated since 2021 and modern anti-bot systems have had years to study and specifically detect its patches. Against Cloudflare, DataDome, and similar systems it is not reliable, and the behavioral patterns Pyppeteer produces beyond the browser fingerprint are still detectable.\n\nIf you need to scrape sites that are actively trying to stop you, maintaining a patched Pyppeteer setup is ongoing work. Spidra handles the full anti-detection stack automatically so you can focus on the data rather than the browser setup.\n\nGet started free at [spidra.io](https://spidra.io/). No credit card required.", "url": "https://wpnews.pro/news/how-to-use-pyppeteer-stealth-for-web-scraping", "canonical_source": "https://spidra.io/blog/pyppeteer-stealth", "published_at": "2026-06-02 00:00:00+00:00", "updated_at": "2026-06-03 09:40:49.380836+00:00", "lang": "en", "topics": ["ai-tools"], "entities": ["pyppeteer_stealth", "Pyppeteer", "Puppeteer Stealth", "Chrome"], "alternates": {"html": "https://wpnews.pro/news/how-to-use-pyppeteer-stealth-for-web-scraping", "markdown": "https://wpnews.pro/news/how-to-use-pyppeteer-stealth-for-web-scraping.md", "text": "https://wpnews.pro/news/how-to-use-pyppeteer-stealth-for-web-scraping.txt", "jsonld": "https://wpnews.pro/news/how-to-use-pyppeteer-stealth-for-web-scraping.jsonld"}}