cd /news/ai-tools/i-tested-every-web-scraping-tool-aga… · home topics ai-tools article
[ARTICLE · art-18305] src=dev.to pub= topic=ai-tools verified=true sentiment=↑ positive

I Tested Every Web Scraping Tool Against Lazada — Here's What Actually Works (May 2026)

A developer tested the open-source Python framework Scrapling against Lazada Singapore, a production site protected by Google reCAPTCHA and custom slider verification, using only a single 4GB VPS with no residential proxies. The framework's three-tier fetcher system—HTTP, browser, and stealth—allowed the developer to bypass anti-bot measures while keeping memory usage under control, though the initial decision to skip the Camoufox browser due to RAM concerns later proved to be a mistake. The built-in MCP server provided 14 tools for AI agent integration, with session management being critical to avoid out-of-memory crashes during concurrent scraping operations.

read8 min publishedMay 30, 2026

I came across Scrapling through a recommendation on X and decided to put it through its paces — not against a demo page, but against Lazada Singapore, a production site with Google reCAPTCHA and a custom slider verification. The setup: a single 4GB VPS, no residential proxies, no credits, just open-source tools.

Here's the full journey: installation pitfalls, wiring it into an AI agent, choosing the right browser for the job, and the real-world benchmarks that followed.

Scrapling is an adaptive web scraping framework for Python (BSD-3, v0.4.8). It handles everything from single HTTP requests to full-scale concurrent crawls. What sets it apart from the BeautifulSoup/Scrapy world:

Fetcher

, curl_cffi), browser (DynamicFetcher

, Playwright Chromium), and stealth (StealthyFetcher

, Chromium + anti-bot patches). Swap with one line.mcp_scrapling_get

, mcp_scrapling_fetch

, mcp_scrapling_stealthy_fetch

directly.It's open source, pip-installable, and designed to be the backbone of a scraping stack — not just another tool in the toolbox.

This is where the real story starts. The VPS has 4GB RAM, 2 vCPUs, 77GB disk, and runs an AI agent gateway (615MB baseline). Every browser installation decision matters.

pip install scrapling[fetchers,ai]   # HTTP + Chromium + MCP server
scrapling install                     # Downloads Playwright browsers

This pulls in Playwright Chromium, Firefox, and WebKit (~1.3GB disk), plus curl_cffi

for HTTP requests and patchright

(Playwright fork) for browser automation.

Camoufox. Every discussion about Scrapling mentions a GitHub thread where someone's VPS hit 1.4GB of RAM running Camoufox. That was enough to scare me off — on a 4GB machine, 1.4GB for one browser is a non-starter. So we skipped it and let Scrapling's StealthyFetcher fall back to Chromium.

Turns out this was the wrong call. More on that later.

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://quotes.toscrape.com/', timeout=15)
quotes = page.css('.quote .text::text').getall()

Clean. Fast. No browser needed. The HTTP fetcher uses curl_cffi

with TLS fingerprint impersonation — it looks like Chrome to the server but costs nothing in RAM.

Scrapling ships with a built-in MCP (Model Context Protocol) server. Start it with scrapling mcp

and your AI coding agent gets 14 native tools:

Tool What it does
get / bulk_get
HTTP fetch with CSS selector extraction
fetch / bulk_fetch
Browser fetch with JS rendering
stealthy_fetch / bulk_stealthy_fetch
Anti-bot browser fetch
open_session / close_session / list_sessions
Persistent browser management
screenshot
Full-page PNG/JPEG capture

The key advantage: CSS selector support means the agent extracts only relevant elements instead of dumping entire pages into context. Token savings compound fast.

The MCP server's session tools aren't optional — they're the difference between stable and catastrophic:

for url in urls:
    page = StealthyFetcher.fetch(url)  # New browser every time

session_id = open_session(type="dynamic")
for url in urls:
    page = fetch(url, session_id=session_id)  # Reuses same browser
close_session(session_id)

One browser, reused. Without sessions, each one-shot fetch spawns a new Chromium process. After 5+ calls, memory pressure spikes. After 20+, you're in OOM territory.

Scrapling's three fetchers form a natural escalation ladder:

Tier Fetcher Engine Best for
1 Fetcher
curl_cffi (HTTP) Static pages, APIs
2 DynamicFetcher
Playwright Chromium JS-rendered SPAs
3 StealthyFetcher
Chromium + anti-bot patches Cloudflare, bot detection

Same API across all three. Same CSS selectors. Same response object. You're not choosing between different libraries — you're choosing how much overhead to pay.

But the real question is: do you need a browser at all? Let's benchmark.

Fetcher Avg Speed vs Fastest
Fetcher (HTTP)
0.77s
DynamicFetcher (Chromium)
3.66s 4.8×
StealthyFetcher
~4s 5.2×

The HTTP fetcher is absurdly fast. Browser-based tools add 3-4 seconds of overhead per page. That gap compounds: 10 pages is 7.7s vs 40s. 100 pages is 77s vs 6.5 minutes.

Fetcher RAM Delta
Fetcher (HTTP)
~0 MB
StealthyFetcher
+120 MB
DynamicFetcher
+180 MB

The rule is simple: start at tier 1 and only escalate when proven necessary. If the page is static, you don't need a browser. If it's JS-rendered, you don't need stealth. If it has anti-bot, you don't need a different IP. Prove each escalation before taking it.

Remember how I skipped Camoufox because of that 1.4GB horror story? After getting the stack running, I decided to test it properly.

pip install camoufox
python -m camoufox fetch  # Downloads the browser binary (~713MB)

Camoufox is actually the lightest browser. Measured on our VPS:

Browser RAM (headless) Stealth Level
Camoufox (Firefox) 81 MB
C++-level
Scrapling StealthyFetcher (Chromium) 120 MB JS-patched
Scrapling DynamicFetcher (Chromium) 180 MB None

The 1.4GB from that GitHub thread was user error — spawning a fresh browser per request without closing old ones. Same thing happens with any browser. Camoufox is a debloated Firefox fork: telemetry stripped, Mozilla services removed, navigator.webdriver

genuinely absent at the C++ level.

But there's a catch: Scrapling's StealthyFetcher uses patchright

(a Playwright Chromium fork) and does NOT auto-detect Camoufox. They don't integrate at the browser level because Playwright's Firefox protocol differs from Chromium's.

The workaround is straightforward:

from camoufox import Camoufox
from scrapling import Selector

with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto('https://target.com')
    html = page.content()

sel = Selector(html)
data = sel.css('.product::text').getall()

Camoufox fetches undetected. Scrapling parses with adaptive resilience. Best of both worlds — but it's slow. More on that next.

Browser Avg Page Load
Scrapling DynamicFetcher (Chromium) 3.66s
Camoufox (Firefox) 8.84s

11× slower than the HTTP fetcher, 2.4× slower than Chromium. Firefox on Linux pays a cold-start tax. Camoufox earns its place at tier 5 in the ladder — not a replacement for Chromium, but a fallback when Chromium's fingerprint is the problem.

All of this — the speed data, the memory measurements, the Camoufox discovery — points to one design:

Priority 1:  Fetcher (HTTP)              0.77s   ~0 MB    Static pages
   ↓ page is empty / JS-rendered?
Priority 3:  DynamicFetcher (Chromium)    3.66s   180 MB   JS-rendered SPAs
   ↓ blocked by anti-bot?
Priority 4:  StealthyFetcher (Chromium)   ~4s     120 MB   Cloudflare, basic WAF
   ↓ Chromium itself blocked?
Priority 5:  Camoufox (Firefox)           8.84s    81 MB   Firefox fingerprint
   ↓ CAPTCHA / aggressive WAF?
Priority 6:  Firecrawl enhanced proxy     ~3-5s    credits Hard targets

Each tier costs more — time or money. Only escalate when proven necessary. The ladder is encoded as an agent skill, so every scraping task automatically starts at tier 1 and escalates on failure.

Lazada SG was the proving ground. Two-layer defense: Google reCAPTCHA → custom slider verification. In a previous test (early May 2026), only Lightpanda's Zig-based browser survived. Every Chromium tool got blocked.

Running the ladder:

Priority Tool Page 1 Page 2 Page 3 Time
1 HTTP Fetcher ❌ Empty 0.77s
3 DynamicFetcher ✅ 41 items ✅ 41 items ✅ 41 items ~3s/page
5 Camoufox ✅ 40 items 42s/page

The ladder worked exactly as designed:

The ladder saved us from jumping straight to Camoufox or paying Firecrawl credits when a simple Chromium browser handled everything.

Priority 1:  Scrapling Fetcher (HTTP)      0.77s   $0
Priority 3:  Scrapling DynamicFetcher       3.66s   $0
Priority 4:  Scrapling StealthyFetcher      ~4s     $0
Priority 5:  Camoufox + Scrapling Selector  8.84s   $0
Priority 6:  Firecrawl enhanced proxy       ~3-5s   credits

Everything runs on a single 4GB VPS. Peak memory with one browser session: ~800MB including the AI agent gateway. 39GB free disk after cleaning stale caches and old kernels. Total scraping cost: $0.

Installation is the first test. Read the docs before pip install

. Know what each dependency costs in RAM. Skip what you don't need — you can always add it later.

The 1.4GB Camoufox story was user error. Spawning browsers in a loop without sessions will eat any machine. With persistent sessions, Camoufox is the lightest browser in the stack at 81MB. Don't believe benchmark threads — run your own.

Speed differences compound silently. 0.77s vs 8.84s is nothing for one page. For 100 pages, it's 77 seconds vs nearly 15 minutes. Choosing the right tier pays off exponentially.

Fingerprint diversity is a superpower. Having both Chromium and Firefox in your arsenal means you can bypass sites that target either. Camoufox is slow but it's a different shape entirely — and sometimes that's all you need.

Wire the ladder, not the tools. Individual tools leave you guessing. A priority ladder gives you a protocol: start cheap, escalate on failure. Encode it as an agent skill and you never have to think about it again.

Scrapling is the platform, not just a fetcher. Adaptive element tracking, three-tier architecture, spider framework with /resume, MCP server for AI agents — it's the foundation everything else plugs into. The benchmarks measure its fetchers, but the framework is what makes them interchangeable.

Questions? Find me on X @mariatanbobo

── more in #ai-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/i-tested-every-web-s…] indexed:0 read:8min 2026-05-30 ·