I Tested Every Web Scraping Tool Against Lazada — Here's What Actually Works (May 2026)

wpnews.pro

I came across Scrapling through a recommendation on X and decided to put it through its paces — not against a demo page, but against Lazada Singapore, a production site with Google reCAPTCHA and a custom slider verification. The setup: a single 4GB VPS, no residential proxies, no credits, just open-source tools.

Here's the full journey: installation pitfalls, wiring it into an AI agent, choosing the right browser for the job, and the real-world benchmarks that followed.

Scrapling is an adaptive web scraping framework for Python (BSD-3, v0.4.8). It handles everything from single HTTP requests to full-scale concurrent crawls. What sets it apart from the BeautifulSoup/Scrapy world:

Fetcher

, curl_cffi), browser (DynamicFetcher

, Playwright Chromium), and stealth (StealthyFetcher

, Chromium + anti-bot patches). Swap with one line.mcp_scrapling_get

, mcp_scrapling_fetch

, mcp_scrapling_stealthy_fetch

directly.It's open source, pip-installable, and designed to be the backbone of a scraping stack — not just another tool in the toolbox.

This is where the real story starts. The VPS has 4GB RAM, 2 vCPUs, 77GB disk, and runs an AI agent gateway (615MB baseline). Every browser installation decision matters.

pip install scrapling[fetchers,ai]   # HTTP + Chromium + MCP server
scrapling install                     # Downloads Playwright browsers

This pulls in Playwright Chromium, Firefox, and WebKit (~1.3GB disk), plus curl_cffi

for HTTP requests and patchright

(Playwright fork) for browser automation.

Camoufox. Every discussion about Scrapling mentions a GitHub thread where someone's VPS hit 1.4GB of RAM running Camoufox. That was enough to scare me off — on a 4GB machine, 1.4GB for one browser is a non-starter. So we skipped it and let Scrapling's StealthyFetcher fall back to Chromium.

Turns out this was the wrong call. More on that later.

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://quotes.toscrape.com/', timeout=15)
quotes = page.css('.quote .text::text').getall()

Clean. Fast. No browser needed. The HTTP fetcher uses curl_cffi

with TLS fingerprint impersonation — it looks like Chrome to the server but costs nothing in RAM.

Scrapling ships with a built-in MCP (Model Context Protocol) server. Start it with scrapling mcp

and your AI coding agent gets 14 native tools:

Tool	What it does
`get` / `bulk_get`

HTTP fetch with CSS selector extraction
`fetch` / `bulk_fetch`

Browser fetch with JS rendering
`stealthy_fetch` / `bulk_stealthy_fetch`

Anti-bot browser fetch
`open_session` / `close_session` / `list_sessions`

Persistent browser management
`screenshot`
Full-page PNG/JPEG capture

The key advantage: CSS selector support means the agent extracts only relevant elements instead of dumping entire pages into context. Token savings compound fast.

The MCP server's session tools aren't optional — they're the difference between stable and catastrophic:

for url in urls:
    page = StealthyFetcher.fetch(url)  # New browser every time

session_id = open_session(type="dynamic")
for url in urls:
    page = fetch(url, session_id=session_id)  # Reuses same browser
close_session(session_id)

One browser, reused. Without sessions, each one-shot fetch spawns a new Chromium process. After 5+ calls, memory pressure spikes. After 20+, you're in OOM territory.

Scrapling's three fetchers form a natural escalation ladder:

Tier	Fetcher	Engine	Best for
1	`Fetcher`
curl_cffi (HTTP)	Static pages, APIs
2	`DynamicFetcher`
Playwright Chromium	JS-rendered SPAs
3	`StealthyFetcher`
Chromium + anti-bot patches	Cloudflare, bot detection

Same API across all three. Same CSS selectors. Same response object. You're not choosing between different libraries — you're choosing how much overhead to pay.

But the real question is: do you need a browser at all? Let's benchmark.

Fetcher	Avg Speed	vs Fastest
`Fetcher` (HTTP)
0.77s
1×
`DynamicFetcher` (Chromium)
3.66s	4.8×
`StealthyFetcher`
~4s	5.2×

The HTTP fetcher is absurdly fast. Browser-based tools add 3-4 seconds of overhead per page. That gap compounds: 10 pages is 7.7s vs 40s. 100 pages is 77s vs 6.5 minutes.

Fetcher	RAM Delta
`Fetcher` (HTTP)
~0 MB
`StealthyFetcher`
+120 MB
`DynamicFetcher`
+180 MB

The rule is simple: start at tier 1 and only escalate when proven necessary. If the page is static, you don't need a browser. If it's JS-rendered, you don't need stealth. If it has anti-bot, you don't need a different IP. Prove each escalation before taking it.

Remember how I skipped Camoufox because of that 1.4GB horror story? After getting the stack running, I decided to test it properly.

pip install camoufox
python -m camoufox fetch  # Downloads the browser binary (~713MB)

Camoufox is actually the lightest browser. Measured on our VPS:

Browser	RAM (headless)	Stealth Level
Camoufox (Firefox)	81 MB
C++-level
Scrapling StealthyFetcher (Chromium)	120 MB	JS-patched
Scrapling DynamicFetcher (Chromium)	180 MB	None

The 1.4GB from that GitHub thread was user error — spawning a fresh browser per request without closing old ones. Same thing happens with any browser. Camoufox is a debloated Firefox fork: telemetry stripped, Mozilla services removed, navigator.webdriver

genuinely absent at the C++ level.

But there's a catch: Scrapling's StealthyFetcher uses patchright

(a Playwright Chromium fork) and does NOT auto-detect Camoufox. They don't integrate at the browser level because Playwright's Firefox protocol differs from Chromium's.

The workaround is straightforward:

from camoufox import Camoufox
from scrapling import Selector

with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto('https://target.com')
    html = page.content()

sel = Selector(html)
data = sel.css('.product::text').getall()

Camoufox fetches undetected. Scrapling parses with adaptive resilience. Best of both worlds — but it's slow. More on that next.

Browser	Avg Page Load
Scrapling DynamicFetcher (Chromium)	3.66s
Camoufox (Firefox)	8.84s

11× slower than the HTTP fetcher, 2.4× slower than Chromium. Firefox on Linux pays a cold-start tax. Camoufox earns its place at tier 5 in the ladder — not a replacement for Chromium, but a fallback when Chromium's fingerprint is the problem.

All of this — the speed data, the memory measurements, the Camoufox discovery — points to one design:

Priority 1:  Fetcher (HTTP)              0.77s   ~0 MB    Static pages
   ↓ page is empty / JS-rendered?
Priority 3:  DynamicFetcher (Chromium)    3.66s   180 MB   JS-rendered SPAs
   ↓ blocked by anti-bot?
Priority 4:  StealthyFetcher (Chromium)   ~4s     120 MB   Cloudflare, basic WAF
   ↓ Chromium itself blocked?
Priority 5:  Camoufox (Firefox)           8.84s    81 MB   Firefox fingerprint
   ↓ CAPTCHA / aggressive WAF?
Priority 6:  Firecrawl enhanced proxy     ~3-5s    credits Hard targets

Each tier costs more — time or money. Only escalate when proven necessary. The ladder is encoded as an agent skill, so every scraping task automatically starts at tier 1 and escalates on failure.

Lazada SG was the proving ground. Two-layer defense: Google reCAPTCHA → custom slider verification. In a previous test (early May 2026), only Lightpanda's Zig-based browser survived. Every Chromium tool got blocked.

Running the ladder:

Priority	Tool	Page 1	Page 2	Page 3	Time
1	HTTP Fetcher	❌ Empty	—	—	0.77s
3	DynamicFetcher	✅ 41 items	✅ 41 items	✅ 41 items	~3s/page
5	Camoufox	✅ 40 items	—	—	42s/page

The ladder worked exactly as designed:

The ladder saved us from jumping straight to Camoufox or paying Firecrawl credits when a simple Chromium browser handled everything.

Priority 1:  Scrapling Fetcher (HTTP)      0.77s   $0
Priority 3:  Scrapling DynamicFetcher       3.66s   $0
Priority 4:  Scrapling StealthyFetcher      ~4s     $0
Priority 5:  Camoufox + Scrapling Selector  8.84s   $0
Priority 6:  Firecrawl enhanced proxy       ~3-5s   credits

Everything runs on a single 4GB VPS. Peak memory with one browser session: ~800MB including the AI agent gateway. 39GB free disk after cleaning stale caches and old kernels. Total scraping cost: $0.

Installation is the first test. Read the docs before pip install

. Know what each dependency costs in RAM. Skip what you don't need — you can always add it later.

The 1.4GB Camoufox story was user error. Spawning browsers in a loop without sessions will eat any machine. With persistent sessions, Camoufox is the lightest browser in the stack at 81MB. Don't believe benchmark threads — run your own.

Speed differences compound silently. 0.77s vs 8.84s is nothing for one page. For 100 pages, it's 77 seconds vs nearly 15 minutes. Choosing the right tier pays off exponentially.

Fingerprint diversity is a superpower. Having both Chromium and Firefox in your arsenal means you can bypass sites that target either. Camoufox is slow but it's a different shape entirely — and sometimes that's all you need.

Wire the ladder, not the tools. Individual tools leave you guessing. A priority ladder gives you a protocol: start cheap, escalate on failure. Encode it as an agent skill and you never have to think about it again.

Scrapling is the platform, not just a fetcher. Adaptive element tracking, three-tier architecture, spider framework with /resume, MCP server for AI agents — it's the foundation everything else plugs into. The benchmarks measure its fetchers, but the framework is what makes them interchangeable.

Questions? Find me on X @mariatanbobo

source & further reading

dev.to — original article Your AI-built app works in the builder but breaks on deploy with a Supabase "permission denied" error. Here is why, and how to fix it. Learning AI orchestration and harness engineering by building an autonomous engineer for a bank I built a kit to stop AI agents from lying about reading your code — then had to rebuild it for a tool that didn't exist yet

I Tested Every Web Scraping Tool Against Lazada — Here's What Actually Works (May 2026)

Run your AI side-project on zahid.host