# We stopped writing Playwright selectors and let AI figure it out

> Source: <https://dev.to/anjo_zulaybar_0b0a0e967eb/we-stopped-writing-playwright-selectors-and-let-ai-figure-it-out-1hh8>
> Published: 2026-05-28 08:02:46+00:00

If you've maintained a Playwright or Cypress test suite for more than a few months, you know the drill. A designer renames a class, a developer restructures a form, and suddenly 30 tests are broken — not because the feature broke, but because .submit-btn became [data-action="submit"].

You end up in a loop: fix selectors, ship, selectors break, fix selectors. The tests stop being useful because nobody trusts them.

We built Confidence Gate — an AI-powered test execution engine where you describe test steps in plain English and the system figures out the rest.

Instead of:

await page.locator('[data-testid="email-input"]').fill('[user@example.com](mailto:user@example.com)');

await page.locator('button[type="submit"]').click();

await expect(page).toHaveURL('/dashboard');

You write:

{ "action": "enter the email from the test data in the email field",

"expected": "the email field contains the entered address" }

"expected": "the dashboard is displayed and the login form is gone" }

The engine translates each step into a typed intent, resolves the target element from the accessibility tree, executes it in a real Playwright browser, takes a screenshot, and verifies the outcome visually.

Each step goes through four stages:

**1. Intent generation** — The natural language action is converted to a structured JSON ({ action: "click", target: { label: "Sign In", role: "button" }, value: null }). This separates intent from implementation.

**2. Element resolution** — A multi-tier resolver finds the element: accessibility tree first (fast, reliable), CSS heuristics second, AI-assisted fallback third.

**3. Execution + behavior detection** — Playwright executes the action. A mutation observer watches for DOM changes, URL changes, and value changes to confirm something actually happened.

**4. Verification** — A vision model looks at the post-action screenshot and checks it against the expected result. If behavior was detected but verification fails, the engine assumes it hit the wrong element and

retries with a blacklisted selector.

When a selector stops working between deploys, the repair loop kicks in. It re-queries the accessibility tree, scores candidate elements against the original target description, and picks the best match. The new selector is cached so the next run is fast.

After a run, every step result feeds into a score (0–100) built from:

The score maps to a gate decision: ship, caution, or block. You can call the API from CI and fail a deployment if the score drops below your threshold.

git clone [https://github.com/OaktreeInnovations/confidence-gate.git](https://github.com/OaktreeInnovations/confidence-gate.git)

cd confidence-gate

cp .env.example .env

make up

Open [http://localhost:3001](http://localhost:3001) and you're running.

We're working on four things in order:

The repo is MIT licensed and open to contributions. If any of this is interesting to you — especially the browser recording or the AI execution engine — come say hi on GitHub.
