# Browser Automation for AI Agents: What Actually Works

> Source: <https://dev.to/dylanworrall/browser-automation-for-ai-agents-what-actually-works-100m>
> Published: 2026-06-18 23:16:36+00:00

*Originally published at dylanworrall.com.*

Most agent demos that involve a browser are shot in one take for a reason. The moment you try to make browser automation *reliable* — running unattended, across sites you don't control, hundreds of times — it stops being a demo and starts being an engineering problem. I've spent a lot of time on that problem building the browser layer inside [Froots](https://froots.ai), and a handful of patterns made the difference between "works in the video" and "works at 3am while I'm asleep."

`eval`

It's tempting to give the agent one giant escape hatch: run arbitrary JavaScript in the page and parse whatever comes back. It works right up until it doesn't, and when it fails it fails opaquely.

A small vocabulary of structured commands beats one omnipotent one:

```
navigate <url>
click <selector>
fill <selector> <value>
type <selector> <value>      # contenteditable-safe; composers ignore plain fill
text <selector>              # read innerText back
wait_selector <selector>     # poll until it exists
```

The point isn't that `eval`

is useless — it's the fallback, not the default. Structured verbs give you predictable error messages ("selector not found" beats a stack trace from inside a minified bundle), and they make the agent's intent legible.

`sleep`

instinct — wait on conditions
The single biggest source of flakiness is `sleep(2000)`

. Too short and you act before the element exists; too long and every run wastes seconds. Replace time with conditions: poll until the element exists, until the spinner is gone, or until navigation lands. An agent that waits on the *thing it actually needs* is both faster and dramatically more reliable than one that guesses at timing.

This is the lesson I learned the hard way. A command would return success and I'd assume the work was done — then find the agent had been talking to a pane that wasn't there. Every call "succeeded" by doing nothing.

The fix is a discipline: **a write should be confirmed by a read.** After you fill a field, read it back. After you click submit, wait for the URL or a success node. Silent success is not the same as success.

A lot of useful data sits behind a login. Rather than scraping a login wall, do an in-page `fetch`

with `credentials: 'include'`

from the right origin — you reuse the existing session instead of re-authenticating or storing credentials. Probe for a login cookie *before* you reach for authenticated data, so you can ask the human to sign in rather than silently scraping an error page.

When the DOM is hostile — shadow roots, canvas UIs, obfuscated class names — stop fighting selectors and take a screenshot. A vision model reading a picture of the page is sometimes the most robust path.

Reliable browser automation is less about clever selectors and more about **closing the loop**: act, observe, confirm, and never trust a result you didn't verify.

I write more about agent architecture — [reliable memory](https://dylanworrall.com/blog/giving-ai-agents-reliable-memory), [agents you can watch work](https://dylanworrall.com/blog/building-froots-agents-you-can-watch), and [building toward a one-person company](https://dylanworrall.com/blog/building-toward-a-zero-employee-company) — over on [my blog](https://dylanworrall.com/blog).

— Dylan Worrall, founder of [Froots](https://froots.ai)
