I Tried BrowserAct: A Browser Runtime Built for AI Agents

A developer tested BrowserAct, a browser automation CLI built for AI agents, and found it addresses the gap between simple remote control and the messy reality of real websites. BrowserAct's key innovation is separating browser identity from session workspace, allowing agents to manage multiple tasks under one account or isolate different accounts. The tool's stealth-extract command successfully rendered JavaScript-heavy pages into clean Markdown, making it useful for AI agents that need structured page content.

In my last browser automation article, I wrote about a simple idea: Your browser already has a remote control. Chrome exposes the Chrome DevTools Protocol. Tools like raw CDP, Playwright MCP, and agent-browser can use it to open tabs, inspect pages, fill forms, click buttons, take screenshots, and read content. That article was mostly about the remote-control layer. But the more I use browser automation with AI agents, the more obvious a second problem becomes: Remote control is not enough. Real websites are messy. They have login state, delayed JavaScript, multiple tabs, anti-bot checks, account switching, CAPTCHAs, QR-code logins, SSO prompts, and pages that ask you to prove you are really you. An agent does not just need a way to click. It needs a browser environment it can reason about, isolate, reuse, pause, hand over to a human, and clean up safely. That is why I wanted to try BrowserAct https://www.browseract.com/ . BrowserAct is a browser automation CLI built for AI agents. It is not just another wrapper around "open page, click button." Its more interesting idea is the operational model around agent browsing: I installed it, read the docs, tested the CLI, and used it against a few real public pages. Here is what I found. The clearest idea in BrowserAct is this: The browser is the identity. The session is the task workspace. That sounds simple, but it fixes a real problem. When an AI agent controls a browser, there are two things we often mix together: Those should not always be the same object. One browser identity might need multiple parallel sessions. For example, an operations agent could check messages, review orders, and export reports under the same logged-in account without each task fighting over one tab. Different accounts, however, should not be squeezed into the same browser identity. They may need independent cookies, profiles, fingerprints, and network settings. That is the model BrowserAct is trying to make explicit. The documented install command is: uv tool install browser-act-cli --python 3.12 On my Linux devbox, this installed cleanly. browser-act --version Output: browser-act 0.1.30 The BrowserAct skill says the agent should not jump straight into random commands. It should first load the runtime guide: browser-act get-skills core --skill-version 2.0.2 I like this design. For a human, --help is often enough. For an AI agent, it is not. The agent needs the operating rules: available browsers, active sessions, current environment state, session ownership rules, safety gates, and the correct open-state-interact-verify-close loop. The bootstrap command gave me exactly that: That is the kind of context an agent needs before touching a browser. BrowserAct has a command called stealth-extract : browser-act stealth-extract https://example.com Think of it as an advanced WebFetch. You pass a URL and it returns clean page content, usually Markdown, without you manually creating a browser session. I first tested it on BrowserAct's own website: browser-act stealth-extract https://www.browseract.com/ That returned a readable Markdown version of a JavaScript-heavy marketing page. This was useful because raw curl against the same site returned a large Next.js payload full of scripts, styles, hydration data, and HTML that is much less pleasant for an agent to reason over. I also tested it against my previous dev.to article: browser-act stealth-extract https://dev.to/timtech4u/your-browser-has-a-remote-control-and-nobody-told-you-5e97 That returned a clean content view with headings, code blocks, tables, and comments. This is a good first use case for BrowserAct: You do not always need a persistent browser. Sometimes you just need rendered page content in a format an LLM can use. Next I tried a delayed JavaScript page: browser-act stealth-extract https://quotes.toscrape.com/js-delayed/ The default extraction returned the page shell but not the delayed quote list. That was not a total surprise. The page waits 10 seconds before mounting the quote content. BrowserAct has a flag for this: browser-act stealth-extract https://quotes.toscrape.com/js-delayed/ --render-wait 11 With that explicit wait, the delayed quotes appeared. That is an important detail. stealth-extract is not magic. If the page mounts important content long after network idle, you may need to tell the extractor to wait. My feedback to the BrowserAct team would be to make --render-wait more visible in the quick start. A single delayed-rendering example would help users understand when extraction "failed" versus when the page simply mounted late. Here is the honest test matrix from this first pass: | Area | Status | Notes | |---|---|---| | CLI install | Tested | Installed cleanly with uv and Python 3.12 | | Agent bootstrap | Tested | get-skills core returned workflow, safety, and environment state | | Public page extraction | Tested | Worked on BrowserAct.com and dev.to | | Delayed JavaScript | Tested | Needed --render-wait 11 for a 10-second delayed mount | | Browser sessions | Not fully tested | Requires creating a BrowserAct browser profile | | Stealth browser | Not fully tested | Requires BrowserAct API key | | Captcha solving | Not fully tested | Requires supported challenge flow/API-enabled setup | | Managed proxies | Not tested | Requires BrowserAct managed proxy setup | | Remote assist | Not tested | Needs a live browser workflow with a human handoff point | I am intentionally separating what I tested from what BrowserAct claims it can do. That matters. Browser automation tools often make broad claims, and the web is too inconsistent for lazy guarantees. The fair statement from this first pass is: BrowserAct's CLI and extraction path worked well, and its browser/session/safety model is well designed for AI agents. The advanced anti-bot, proxy, captcha, and remote-assist claims deserve a second hands-on test with an API-key-enabled setup. Most browser automation tutorials focus on actions: Those are necessary, but they are not the whole problem. When an AI agent uses a browser, the hard questions are usually operational: BrowserAct is interesting because it treats those as core product questions. That is the difference between a browser driver and a browser runtime. Browser automation is powerful enough to be dangerous. An agent can read authenticated pages, click destructive buttons, submit forms, upload files, and operate inside real accounts. BrowserAct's skill defines confirmation gates around sensitive operations: This is the right instinct. For example, importing a local Chrome profile is convenient because it can reuse login state. But it is also sensitive because it copies browser state into an automation environment. An agent should explain that before doing it. Same with deleting a browser. That can destroy cookies, login state, and profile data. Same with proxy changes. That changes the network identity a website sees. The best browser agents will not be the ones that click fastest. They will be the ones that know when to stop and ask. In my previous article, I compared raw CDP, Playwright MCP, and agent-browser. BrowserAct fits beside them, but at a slightly different level. Raw CDP is the low-level protocol. It gives you maximum control, but you build the workflow and safety model yourself. Playwright MCP gives agents structured browser automation with strong testing roots and isolated contexts. agent-browser gives you a fast CLI for direct browser control, including CDP-connected workflows. BrowserAct is trying to package the higher-level operational layer: That makes it feel less like "another click tool" and more like infrastructure for agent browsing. Imagine an AI operations agent that checks a dashboard every morning. It needs to: That workflow needs more than Playwright-style actions. It needs persistent identity, session isolation, clean extraction, safe interaction rules, and a human fallback path. That is exactly the kind of space BrowserAct is designed for. First, clarify the "No registration needed" messaging. The public site says no registration is needed. The docs also explain that some managed features require an API key, including stealth browsers, managed proxies, and captcha solving. That distinction should be explicit: Local Chrome automation works without registration. Managed BrowserAct features require a BrowserAct account/API key. Second, add a five-minute local-only quick start. Something like: uv tool install browser-act-cli --python 3.12 browser-act get-skills core --skill-version 2.0.2 browser-act stealth-extract https://www.browseract.com/ browser-act browser list Then a second path for full browser sessions: browser-act browser create --name browseract-eval --type chrome --desc "Public-page evaluation" browser-act --session first-run browser open browseract-eval https://example.com browser-act --session first-run state browser-act session close first-run Third, document delayed rendering more prominently: browser-act stealth-extract https://example.com/slow-page --render-wait 10 That flag matters for pages that mount content after network idle. Browser automation for AI agents is moving past: Can it click a button? The harder question is: Can an agent operate safely and reliably inside the real web? BrowserAct is interesting because it is designed around that second question. It gives AI agents a browser model with identity, sessions, clean extraction, safety gates, and a path for human handoff when automation hits real-world friction. I would not frame it as "guaranteed CAPTCHA bypass" or "automation that never gets blocked." That kind of claim is too broad for the web. The stronger and more credible framing is: BrowserAct is a browser runtime for AI agents that need to work beyond clean demo pages. That is a useful direction, and it is where agent browser tooling needs to go. Links: Find me at timtech4u.dev https://timtech4u.dev or @timtech4u https://x.com/timtech4u .