I Tested Every Web Scraping Tool Against Lazada — Here's What Actually Works (May 2026)

A developer tested the open-source Python framework Scrapling against Lazada Singapore, a production site protected by Google reCAPTCHA and custom slider verification, using only a single 4GB VPS with no residential proxies. The framework's three-tier fetcher system—HTTP, browser, and stealth—allowed the developer to bypass anti-bot measures while keeping memory usage under control, though the initial decision to skip the Camoufox browser due to RAM concerns later proved to be a mistake. The built-in MCP server provided 14 tools for AI agent integration, with session management being critical to avoid out-of-memory crashes during concurrent scraping operations.

I came across Scrapling https://github.com/D4Vinci/Scrapling through a recommendation on X and decided to put it through its paces — not against a demo page, but against Lazada Singapore, a production site with Google reCAPTCHA and a custom slider verification. The setup: a single 4GB VPS, no residential proxies, no credits, just open-source tools. Here's the full journey: installation pitfalls, wiring it into an AI agent, choosing the right browser for the job, and the real-world benchmarks that followed. Scrapling is an adaptive web scraping framework for Python BSD-3, v0.4.8 . It handles everything from single HTTP requests to full-scale concurrent crawls. What sets it apart from the BeautifulSoup/Scrapy world: Fetcher , curl cffi , browser DynamicFetcher , Playwright Chromium , and stealth StealthyFetcher , Chromium + anti-bot patches . Swap with one line. mcp scrapling get , mcp scrapling fetch , mcp scrapling stealthy fetch directly.It's open source, pip-installable, and designed to be the backbone of a scraping stack — not just another tool in the toolbox. This is where the real story starts. The VPS has 4GB RAM, 2 vCPUs, 77GB disk, and runs an AI agent gateway 615MB baseline . Every browser installation decision matters. pip install scrapling fetchers,ai HTTP + Chromium + MCP server scrapling install Downloads Playwright browsers This pulls in Playwright Chromium, Firefox, and WebKit ~1.3GB disk , plus curl cffi for HTTP requests and patchright Playwright fork for browser automation. Camoufox. Every discussion about Scrapling mentions a GitHub thread where someone's VPS hit 1.4GB of RAM running Camoufox. That was enough to scare me off — on a 4GB machine, 1.4GB for one browser is a non-starter. So we skipped it and let Scrapling's StealthyFetcher fall back to Chromium. Turns out this was the wrong call. More on that later. python from scrapling.fetchers import Fetcher page = Fetcher.get 'https://quotes.toscrape.com/', timeout=15 quotes = page.css '.quote .text::text' .getall 0.88s, 200 OK, 10 quotes parsed Memory: 56MB RSS Clean. Fast. No browser needed. The HTTP fetcher uses curl cffi with TLS fingerprint impersonation — it looks like Chrome to the server but costs nothing in RAM. Scrapling ships with a built-in MCP Model Context Protocol server. Start it with scrapling mcp and your AI coding agent gets 14 native tools: | Tool | What it does | |---|---| get / bulk get | HTTP fetch with CSS selector extraction | fetch / bulk fetch | Browser fetch with JS rendering | stealthy fetch / bulk stealthy fetch | Anti-bot browser fetch | open session / close session / list sessions | Persistent browser management | screenshot | Full-page PNG/JPEG capture | The key advantage: CSS selector support means the agent extracts only relevant elements instead of dumping entire pages into context. Token savings compound fast. The MCP server's session tools aren't optional — they're the difference between stable and catastrophic: ❌ Don't do this in a loop for url in urls: page = StealthyFetcher.fetch url New browser every time ✅ Do this instead session id = open session type="dynamic" for url in urls: page = fetch url, session id=session id Reuses same browser close session session id One browser, reused. Without sessions, each one-shot fetch spawns a new Chromium process. After 5+ calls, memory pressure spikes. After 20+, you're in OOM territory. Scrapling's three fetchers form a natural escalation ladder: | Tier | Fetcher | Engine | Best for | |---|---|---|---| | 1 | Fetcher | curl cffi HTTP | Static pages, APIs | | 2 | DynamicFetcher | Playwright Chromium | JS-rendered SPAs | | 3 | StealthyFetcher | Chromium + anti-bot patches | Cloudflare, bot detection | Same API across all three. Same CSS selectors. Same response object. You're not choosing between different libraries — you're choosing how much overhead to pay. But the real question is: do you need a browser at all? Let's benchmark. | Fetcher | Avg Speed | vs Fastest | |---|---|---| Fetcher HTTP | 0.77s | 1× | DynamicFetcher Chromium | 3.66s | 4.8× | StealthyFetcher | ~4s | 5.2× | The HTTP fetcher is absurdly fast. Browser-based tools add 3-4 seconds of overhead per page . That gap compounds: 10 pages is 7.7s vs 40s. 100 pages is 77s vs 6.5 minutes. | Fetcher | RAM Delta | |---|---| Fetcher HTTP | ~0 MB | StealthyFetcher | +120 MB | DynamicFetcher | +180 MB | The rule is simple: start at tier 1 and only escalate when proven necessary. If the page is static, you don't need a browser. If it's JS-rendered, you don't need stealth. If it has anti-bot, you don't need a different IP. Prove each escalation before taking it. Remember how I skipped Camoufox because of that 1.4GB horror story? After getting the stack running, I decided to test it properly. pip install camoufox python -m camoufox fetch Downloads the browser binary ~713MB Camoufox is actually the lightest browser. Measured on our VPS: | Browser | RAM headless | Stealth Level | |---|---|---| | Camoufox Firefox | 81 MB | C++-level | | Scrapling StealthyFetcher Chromium | 120 MB | JS-patched | | Scrapling DynamicFetcher Chromium | 180 MB | None | The 1.4GB from that GitHub thread was user error — spawning a fresh browser per request without closing old ones. Same thing happens with any browser. Camoufox is a debloated Firefox fork: telemetry stripped, Mozilla services removed, navigator.webdriver genuinely absent at the C++ level. But there's a catch: Scrapling's StealthyFetcher uses patchright a Playwright Chromium fork and does NOT auto-detect Camoufox. They don't integrate at the browser level because Playwright's Firefox protocol differs from Chromium's. The workaround is straightforward: python from camoufox import Camoufox from scrapling import Selector Camoufox: stealth browsing with Firefox fingerprint 81MB with Camoufox headless=True as browser: page = browser.new page page.goto 'https://target.com' html = page.content Scrapling: adaptive parsing with CSS/XPath sel = Selector html data = sel.css '.product::text' .getall Camoufox fetches undetected. Scrapling parses with adaptive resilience. Best of both worlds — but it's slow. More on that next. | Browser | Avg Page Load | |---|---| | Scrapling DynamicFetcher Chromium | 3.66s | | Camoufox Firefox | 8.84s | 11× slower than the HTTP fetcher, 2.4× slower than Chromium. Firefox on Linux pays a cold-start tax. Camoufox earns its place at tier 5 in the ladder — not a replacement for Chromium, but a fallback when Chromium's fingerprint is the problem. All of this — the speed data, the memory measurements, the Camoufox discovery — points to one design: Priority 1: Fetcher HTTP 0.77s ~0 MB Static pages ↓ page is empty / JS-rendered? Priority 3: DynamicFetcher Chromium 3.66s 180 MB JS-rendered SPAs ↓ blocked by anti-bot? Priority 4: StealthyFetcher Chromium ~4s 120 MB Cloudflare, basic WAF ↓ Chromium itself blocked? Priority 5: Camoufox Firefox 8.84s 81 MB Firefox fingerprint ↓ CAPTCHA / aggressive WAF? Priority 6: Firecrawl enhanced proxy ~3-5s credits Hard targets Each tier costs more — time or money. Only escalate when proven necessary. The ladder is encoded as an agent skill, so every scraping task automatically starts at tier 1 and escalates on failure. Lazada SG was the proving ground. Two-layer defense: Google reCAPTCHA → custom slider verification. In a previous test early May 2026 , only Lightpanda's Zig-based browser survived. Every Chromium tool got blocked. Running the ladder: | Priority | Tool | Page 1 | Page 2 | Page 3 | Time | |---|---|---|---|---|---| | 1 | HTTP Fetcher | ❌ Empty | — | — | 0.77s | | 3 | DynamicFetcher | ✅ 41 items | ✅ 41 items | ✅ 41 items | ~3s/page | | 5 | Camoufox | ✅ 40 items | — | — | 42s/page | The ladder worked exactly as designed: The ladder saved us from jumping straight to Camoufox or paying Firecrawl credits when a simple Chromium browser handled everything. Priority 1: Scrapling Fetcher HTTP 0.77s $0 Priority 3: Scrapling DynamicFetcher 3.66s $0 Priority 4: Scrapling StealthyFetcher ~4s $0 Priority 5: Camoufox + Scrapling Selector 8.84s $0 Priority 6: Firecrawl enhanced proxy ~3-5s credits Everything runs on a single 4GB VPS. Peak memory with one browser session: ~800MB including the AI agent gateway. 39GB free disk after cleaning stale caches and old kernels. Total scraping cost: $0. Installation is the first test. Read the docs before pip install . Know what each dependency costs in RAM. Skip what you don't need — you can always add it later. The 1.4GB Camoufox story was user error. Spawning browsers in a loop without sessions will eat any machine. With persistent sessions, Camoufox is the lightest browser in the stack at 81MB. Don't believe benchmark threads — run your own. Speed differences compound silently. 0.77s vs 8.84s is nothing for one page. For 100 pages, it's 77 seconds vs nearly 15 minutes. Choosing the right tier pays off exponentially. Fingerprint diversity is a superpower. Having both Chromium and Firefox in your arsenal means you can bypass sites that target either. Camoufox is slow but it's a different shape entirely — and sometimes that's all you need. Wire the ladder, not the tools. Individual tools leave you guessing. A priority ladder gives you a protocol: start cheap, escalate on failure. Encode it as an agent skill and you never have to think about it again. Scrapling is the platform, not just a fetcher. Adaptive element tracking, three-tier architecture, spider framework with pause/resume, MCP server for AI agents — it's the foundation everything else plugs into. The benchmarks measure its fetchers, but the framework is what makes them interchangeable. Questions? Find me on X @mariatanbobo