Using Lightpanda with agent-browser

Vercel's agent-browser CLI now supports Lightpanda as an alternative browser engine, offering 9x faster performance and 16x less memory usage than Chrome for AI agent workflows. By using Lightpanda's headless engine, agents achieve higher accuracy on benchmarks like AssistantBench and GAIA Level 1 while reducing token consumption per snapshot.

Using Lightpanda with agent-browser Katie Brown Cofounder & COO TL;DR agent-browser https://github.com/vercel-labs/agent-browser is Vercel’s browser automation CLI for AI agents. Point it at Lightpanda with --engine lightpanda and your existing workflow runs on an engine that starts instantly, runs faster and uses far less memory than Chrome. This post covers best practices and includes a demo to try yourself. Why use agent-browser with Lightpanda agent-browser gives an agent a clean way to act on the web. You navigate to a page, take a snapshot, and get back a compact accessibility tree where every element has a short ref like @e1 . The agent reads that, picks a ref, and clicks or fills it. The text output runs around 200 to 400 tokens per snapshot instead of the 3,000 to 5,000 you would spend dumping raw DOM. Underneath, agent-browser talks to a browser over the Chrome DevTools Protocol CDP . By default that browser is Chromium. Lightpanda speaks CDP, so agent-browser manages it the same way: it spawns the process, connects over CDP, and shuts it down. Every downstream command works through the same path. Lightpanda’s engine is the reason to swap. It skips the graphical rendering that Chrome carries, so it runs 9x faster and uses 16x less memory https://lightpanda.io/blog/posts/from-local-to-real-world-benchmarks on equivalent workloads. For an agent that loops through hundreds of pages per task, reading structure and acting on it rather than looking at pixels, it doesn’t need Chrome’s rendering pipeline. An agent that only handles what it needs is faster and more accurate, because there’s less in front of it to get wrong. We measured this by running agent-browser on both engines through AssistantBench and GAIA Level 1, with Claude Sonnet 4.6 as the model held constant. agent-browser wrapping Lightpanda hit 0.606 strict accuracy on AssistantBench and 0.887 on GAIA Level 1, ahead of the same tool surface on Chrome. You can read the full benchmark writeup https://lightpanda.io/blog/posts/benchmarking-lightpanda-for-agents and reproduce it from the open harness https://github.com/lightpanda-io/agent-benchmarks . Set the engine once, not on every command You can pass --engine lightpanda on every call. If it’s your default, set it once instead. The environment variable is the cleanest option for a shell session or a CI job: export AGENT BROWSER ENGINE=lightpanda agent-browser open example.com agent-browser snapshot -i For a project that should always use Lightpanda, put it in agent-browser.json so every contributor gets the same behaviour. If the lightpanda binary is not on your PATH , point at it explicitly with --executable-path /path/to/lightpanda . Otherwise, agent-browser finds it by name. The snapshot, ref, act, re-snapshot loop This is the core workflow, and it’s the same on Lightpanda as on Chrome. Open a page, snapshot it, act on a ref, then snapshot again before the next action. agent-browser open https://example.com agent-browser snapshot -i - heading "Example Domain" ref=e1 - link "Learn more" ref=e2 agent-browser click @e2 Refs are tied to a single snapshot. The moment the page changes, those refs are stale. If you click a link that navigates, the old @e2 no longer means anything. Take a fresh snapshot to get new refs before you act again. agent-browser click @e1 this navigates agent-browser snapshot -i get fresh refs for the new page agent-browser click @e3 safe to act now The browser persists between commands through a background daemon, so chaining is cheap. You can join a sequence with && in one shell call and it stays fast: agent-browser open example.com && agent-browser snapshot -i Reach for snapshots, not screenshots This is where the engine difference matters. Because Lightpanda has no graphical rendering engine, if you ask it for a screenshot you’ll get a placeholder image. That’s by design. Plenty of workflows rely on screenshots: capture the page, overlay numbered boxes, send the image to a vision model, and ask it where to click. That works, and for some tasks it’s the right call. But it’s worth testing the snapshot path first, because skipping vision could cut latency and token cost. The snapshot -i accessibility tree already tells the model what’s on the page and gives it a ref to act on. If you’re building on Lightpanda, ground your agent in the snapshot and the get commands rather than the camera: agent-browser get title agent-browser get url agent-browser get text " main" If your task genuinely needs a rendered image, that is your signal to fall back to Chrome for that step. More on that below. Run wide in CI and at scale Instant startup and a small memory footprint change what you can do with concurrency. A fleet of headless Chrome instances is the part of an automation or testing stack that leaks memory and needs scheduled restarts. Lightpanda was built for sustained automation, so you can run more sessions at once. CI runners are resource-constrained. agent-browser’s own docs https://agent-browser.dev/engines/lightpanda call out Lightpanda as a good fit for fast scraping, agent workflows where speed and low memory matter, CI with constrained resources, and high-volume parallel automation. Fallback to Chrome Lightpanda is purpose-built and headless-only, so the following Chrome features are not supported: - Extensions --extension - Persistent profiles --profile - Saved storage state --state - File access --allow-file-access agent-browser gives you a clear error if you combine --engine lightpanda with a flag it cannot honour, so you find out immediately rather than halfway through a run. The good news is that the fallback costs you almost nothing. The commands are identical across engines. If a task needs full browser fidelity, a real screenshot, a persistent login profile, or an extension, you switch back to Chrome by changing one flag. A common pattern is to run the bulk of your workload on Lightpanda for speed and cost, then route the few steps that truly need rendering to Chrome. Try it yourself Here is a complete session you can paste into a terminal. It installs both tools, runs the snapshot-act loop against a real site, and extracts data. Install Lightpanda from the latest nightly https://github.com/lightpanda-io/browser/releases/tag/nightly with the one-liner: curl -fsSL https://pkg.lightpanda.io/install.sh | bash Install agent-browser and select Lightpanda as the engine: npm install -g agent-browser export AGENT BROWSER ENGINE=lightpanda Now the demo. Open Hacker News, snapshot it to see the refs, then pull the top headlines straight from the rendered DOM: agent-browser open https://news.ycombinator.com agent-browser snapshot -i agent-browser eval " ...document.querySelectorAll '.titleline a' .slice 0,10 .map a,i = i+1 +'. '+a.textContent .join '\n' " | node -pe "JSON.parse require 'fs' .readFileSync 0 " agent-browser close snapshot -i returns a compact tree of every link on the page, each with a ref the agent can act on. The output looks like this: - link "Hacker News" ref=e2 - link "new" ref=e3 - link "Om Malik has died" ref=e12 - link "om.co" ref=e13 - link "68comments" ref=e17 - link "An entire Herculaneum scroll has been read for the first time" ref=e19 ... The snapshot gives the agent the page structure. The eval line reads the rendered DOM after JavaScript has run and returns the top headlines. agent-browser hands the result back as a JSON string, so node prints it as clean lines: 1. Om Malik has died 2. We All Depend on Open Source. We Will Defend It Together 3. An entire Herculaneum scroll has been read for the first time 4. Framework's 10G Ethernet module exposes USB-C's complexity ... The page stays loaded between commands through a background daemon, and close ends the session. FAQ What is agent-browser? agent-browser is an open-source browser automation CLI from Vercel, built for AI agents. It drives a browser over CDP and returns pages as compact accessibility-tree snapshots with short element refs, so an agent can act on a page using a few hundred tokens instead of parsing raw HTML. It works with Claude Code, Cursor, Codex, and any agent that can run shell commands. How do I use Lightpanda as the engine? Pass --engine lightpanda on a command, set AGENT BROWSER ENGINE=lightpanda in your environment, or add "engine": "lightpanda" to agent-browser.json . agent-browser then spawns Lightpanda and connects over CDP. Why use Lightpanda instead of Chrome with agent-browser? Lightpanda runs 9x faster and uses 16x less memory than headless Chrome because it has no rendering pipeline. Under the same agent-browser tool surface in our benchmarks, the Lightpanda engine cut GAIA Level 1 wall time per task by 29% and quartered the timeout rate. For agent loops, scraping, and CI, that is lower cost and fewer failed runs. Do screenshots work with Lightpanda? No. Lightpanda has no graphical rendering engine, so a screenshot returns a placeholder image. Ground your agent in snapshot -i and the get commands instead. If a task truly needs a rendered image, switch that step to the Chrome engine. When should I fall back to Chrome? Use Chrome when you need full browser fidelity: real screenshots, browser extensions, persistent login profiles, or saved storage state. The commands are identical across engines, so switching is a one-flag change. Many teams run the bulk of a workload on Lightpanda and route only the rendering-dependent steps to Chrome. Katie Brown Cofounder & COO Katie led the commercial team at BlueBoard, where she met Pierre and Francis. She rejoined them on the Lightpanda adventure to lead GTM and to keep the product closely aligned with what developers actually need. She also drives community efforts and, by popular vote, serves as chief sticker officer.