# I Tested Every Web Scraping Tool Against Lazada — Here's What Actually Works (May 2026)

> Source: <https://dev.to/mariatanbobo/i-tested-every-web-scraping-tool-against-lazada-heres-what-actually-works-may-2026-16pg>
> Published: 2026-05-30 03:18:10+00:00

I came across [Scrapling](https://github.com/D4Vinci/Scrapling) through a recommendation on X and decided to put it through its paces — not against a demo page, but against Lazada Singapore, a production site with Google reCAPTCHA and a custom slider verification. The setup: a single 4GB VPS, no residential proxies, no credits, just open-source tools.

Here's the full journey: installation pitfalls, wiring it into an AI agent, choosing the right browser for the job, and the real-world benchmarks that followed.

Scrapling is an adaptive web scraping framework for Python (BSD-3, v0.4.8). It handles everything from single HTTP requests to full-scale concurrent crawls. What sets it apart from the BeautifulSoup/Scrapy world:

`Fetcher`

, curl_cffi), browser (`DynamicFetcher`

, Playwright Chromium), and stealth (`StealthyFetcher`

, Chromium + anti-bot patches). Swap with one line.`mcp_scrapling_get`

, `mcp_scrapling_fetch`

, `mcp_scrapling_stealthy_fetch`

directly.It's open source, pip-installable, and designed to be the backbone of a scraping stack — not just another tool in the toolbox.

This is where the real story starts. The VPS has 4GB RAM, 2 vCPUs, 77GB disk, and runs an AI agent gateway (615MB baseline). Every browser installation decision matters.

```
pip install scrapling[fetchers,ai]   # HTTP + Chromium + MCP server
scrapling install                     # Downloads Playwright browsers
```

This pulls in Playwright Chromium, Firefox, and WebKit (~1.3GB disk), plus `curl_cffi`

for HTTP requests and `patchright`

(Playwright fork) for browser automation.

**Camoufox.** Every discussion about Scrapling mentions a GitHub thread where someone's VPS hit 1.4GB of RAM running Camoufox. That was enough to scare me off — on a 4GB machine, 1.4GB for one browser is a non-starter. So we skipped it and let Scrapling's StealthyFetcher fall back to Chromium.

Turns out this was the wrong call. More on that later.

``` python
from scrapling.fetchers import Fetcher

page = Fetcher.get('https://quotes.toscrape.com/', timeout=15)
quotes = page.css('.quote .text::text').getall()
# 0.88s, 200 OK, 10 quotes parsed
# Memory: 56MB RSS
```

Clean. Fast. No browser needed. The HTTP fetcher uses `curl_cffi`

with TLS fingerprint impersonation — it looks like Chrome to the server but costs nothing in RAM.

Scrapling ships with a built-in MCP (Model Context Protocol) server. Start it with `scrapling mcp`

and your AI coding agent gets 14 native tools:

| Tool | What it does |
|---|---|
`get` / `bulk_get`
|
HTTP fetch with CSS selector extraction |
`fetch` / `bulk_fetch`
|
Browser fetch with JS rendering |
`stealthy_fetch` / `bulk_stealthy_fetch`
|
Anti-bot browser fetch |
`open_session` / `close_session` / `list_sessions`
|
Persistent browser management |
`screenshot` |
Full-page PNG/JPEG capture |

The key advantage: CSS selector support means the agent extracts only relevant elements instead of dumping entire pages into context. Token savings compound fast.

The MCP server's session tools aren't optional — they're the difference between stable and catastrophic:

```
# ❌ Don't do this in a loop
for url in urls:
    page = StealthyFetcher.fetch(url)  # New browser every time

# ✅ Do this instead
session_id = open_session(type="dynamic")
for url in urls:
    page = fetch(url, session_id=session_id)  # Reuses same browser
close_session(session_id)
```

One browser, reused. Without sessions, each one-shot fetch spawns a new Chromium process. After 5+ calls, memory pressure spikes. After 20+, you're in OOM territory.

Scrapling's three fetchers form a natural escalation ladder:

| Tier | Fetcher | Engine | Best for |
|---|---|---|---|
| 1 | `Fetcher` |
curl_cffi (HTTP) | Static pages, APIs |
| 2 | `DynamicFetcher` |
Playwright Chromium | JS-rendered SPAs |
| 3 | `StealthyFetcher` |
Chromium + anti-bot patches | Cloudflare, bot detection |

Same API across all three. Same CSS selectors. Same response object. You're not choosing between different libraries — you're choosing how much overhead to pay.

But the real question is: **do you need a browser at all?** Let's benchmark.

| Fetcher | Avg Speed | vs Fastest |
|---|---|---|
`Fetcher` (HTTP) |
0.77s |
1× |
`DynamicFetcher` (Chromium) |
3.66s | 4.8× |
`StealthyFetcher` |
~4s | 5.2× |

The HTTP fetcher is absurdly fast. Browser-based tools add 3-4 seconds of overhead *per page*. That gap compounds: 10 pages is 7.7s vs 40s. 100 pages is 77s vs 6.5 minutes.

| Fetcher | RAM Delta |
|---|---|
`Fetcher` (HTTP) |
~0 MB |
`StealthyFetcher` |
+120 MB |
`DynamicFetcher` |
+180 MB |

The rule is simple: **start at tier 1 and only escalate when proven necessary.** If the page is static, you don't need a browser. If it's JS-rendered, you don't need stealth. If it has anti-bot, you don't need a different IP. Prove each escalation before taking it.

Remember how I skipped Camoufox because of that 1.4GB horror story? After getting the stack running, I decided to test it properly.

```
pip install camoufox
python -m camoufox fetch  # Downloads the browser binary (~713MB)
```

**Camoufox is actually the lightest browser.** Measured on our VPS:

| Browser | RAM (headless) | Stealth Level |
|---|---|---|
| Camoufox (Firefox) | 81 MB |
C++-level |
| Scrapling StealthyFetcher (Chromium) | 120 MB | JS-patched |
| Scrapling DynamicFetcher (Chromium) | 180 MB | None |

The 1.4GB from that GitHub thread was user error — spawning a fresh browser per request without closing old ones. Same thing happens with any browser. Camoufox is a debloated Firefox fork: telemetry stripped, Mozilla services removed, `navigator.webdriver`

genuinely absent at the C++ level.

**But there's a catch:** Scrapling's StealthyFetcher uses `patchright`

(a Playwright Chromium fork) and does NOT auto-detect Camoufox. They don't integrate at the browser level because Playwright's Firefox protocol differs from Chromium's.

The workaround is straightforward:

``` python
from camoufox import Camoufox
from scrapling import Selector

# Camoufox: stealth browsing with Firefox fingerprint (81MB)
with Camoufox(headless=True) as browser:
    page = browser.new_page()
    page.goto('https://target.com')
    html = page.content()

# Scrapling: adaptive parsing with CSS/XPath
sel = Selector(html)
data = sel.css('.product::text').getall()
```

Camoufox fetches undetected. Scrapling parses with adaptive resilience. Best of both worlds — but it's slow. More on that next.

| Browser | Avg Page Load |
|---|---|
| Scrapling DynamicFetcher (Chromium) | 3.66s |
| Camoufox (Firefox) | 8.84s |

11× slower than the HTTP fetcher, 2.4× slower than Chromium. Firefox on Linux pays a cold-start tax. Camoufox earns its place at tier 5 in the ladder — not a replacement for Chromium, but a fallback when Chromium's fingerprint is the problem.

All of this — the speed data, the memory measurements, the Camoufox discovery — points to one design:

```
Priority 1:  Fetcher (HTTP)              0.77s   ~0 MB    Static pages
   ↓ page is empty / JS-rendered?
Priority 3:  DynamicFetcher (Chromium)    3.66s   180 MB   JS-rendered SPAs
   ↓ blocked by anti-bot?
Priority 4:  StealthyFetcher (Chromium)   ~4s     120 MB   Cloudflare, basic WAF
   ↓ Chromium itself blocked?
Priority 5:  Camoufox (Firefox)           8.84s    81 MB   Firefox fingerprint
   ↓ CAPTCHA / aggressive WAF?
Priority 6:  Firecrawl enhanced proxy     ~3-5s    credits Hard targets
```

Each tier costs more — time or money. Only escalate when proven necessary. The ladder is encoded as an agent skill, so every scraping task automatically starts at tier 1 and escalates on failure.

Lazada SG was the proving ground. Two-layer defense: Google reCAPTCHA → custom slider verification. In a previous test (early May 2026), only Lightpanda's Zig-based browser survived. Every Chromium tool got blocked.

Running the ladder:

| Priority | Tool | Page 1 | Page 2 | Page 3 | Time |
|---|---|---|---|---|---|
| 1 | HTTP Fetcher | ❌ Empty | — | — | 0.77s |
| 3 | DynamicFetcher | ✅ 41 items | ✅ 41 items | ✅ 41 items | ~3s/page |
| 5 | Camoufox | ✅ 40 items | — | — | 42s/page |

The ladder worked exactly as designed:

The ladder saved us from jumping straight to Camoufox or paying Firecrawl credits when a simple Chromium browser handled everything.

```
Priority 1:  Scrapling Fetcher (HTTP)      0.77s   $0
Priority 3:  Scrapling DynamicFetcher       3.66s   $0
Priority 4:  Scrapling StealthyFetcher      ~4s     $0
Priority 5:  Camoufox + Scrapling Selector  8.84s   $0
Priority 6:  Firecrawl enhanced proxy       ~3-5s   credits
```

Everything runs on a single 4GB VPS. Peak memory with one browser session: ~800MB including the AI agent gateway. 39GB free disk after cleaning stale caches and old kernels. Total scraping cost: $0.

**Installation is the first test.** Read the docs before `pip install`

. Know what each dependency costs in RAM. Skip what you don't need — you can always add it later.

**The 1.4GB Camoufox story was user error.** Spawning browsers in a loop without sessions will eat any machine. With persistent sessions, Camoufox is the lightest browser in the stack at 81MB. Don't believe benchmark threads — run your own.

**Speed differences compound silently.** 0.77s vs 8.84s is nothing for one page. For 100 pages, it's 77 seconds vs nearly 15 minutes. Choosing the right tier pays off exponentially.

**Fingerprint diversity is a superpower.** Having both Chromium and Firefox in your arsenal means you can bypass sites that target either. Camoufox is slow but it's a different shape entirely — and sometimes that's all you need.

**Wire the ladder, not the tools.** Individual tools leave you guessing. A priority ladder gives you a protocol: start cheap, escalate on failure. Encode it as an agent skill and you never have to think about it again.

**Scrapling is the platform, not just a fetcher.** Adaptive element tracking, three-tier architecture, spider framework with pause/resume, MCP server for AI agents — it's the foundation everything else plugs into. The benchmarks measure its fetchers, but the framework is what makes them interchangeable.

*Questions? Find me on X @mariatanbobo*
