We stopped writing Playwright selectors and let AI figure it out

A developer built Confidence Gate, an AI-powered test execution engine that replaces brittle Playwright selectors with plain English test steps. The system resolves elements through the accessibility tree, executes actions in a real Playwright browser, and verifies outcomes visually using a vision model. When selectors break between deploys, the engine automatically repairs them by re-querying the accessibility tree and scoring candidate elements against the original target description.

If you've maintained a Playwright or Cypress test suite for more than a few months, you know the drill. A designer renames a class, a developer restructures a form, and suddenly 30 tests are broken — not because the feature broke, but because .submit-btn became data-action="submit" . You end up in a loop: fix selectors, ship, selectors break, fix selectors. The tests stop being useful because nobody trusts them. We built Confidence Gate — an AI-powered test execution engine where you describe test steps in plain English and the system figures out the rest. Instead of: await page.locator ' data-testid="email-input" ' .fill ' user@example.com mailto:user@example.com ' ; await page.locator 'button type="submit" ' .click ; await expect page .toHaveURL '/dashboard' ; You write: { "action": "enter the email from the test data in the email field", "expected": "the email field contains the entered address" } "expected": "the dashboard is displayed and the login form is gone" } The engine translates each step into a typed intent, resolves the target element from the accessibility tree, executes it in a real Playwright browser, takes a screenshot, and verifies the outcome visually. Each step goes through four stages: 1. Intent generation — The natural language action is converted to a structured JSON { action: "click", target: { label: "Sign In", role: "button" }, value: null } . This separates intent from implementation. 2. Element resolution — A multi-tier resolver finds the element: accessibility tree first fast, reliable , CSS heuristics second, AI-assisted fallback third. 3. Execution + behavior detection — Playwright executes the action. A mutation observer watches for DOM changes, URL changes, and value changes to confirm something actually happened. 4. Verification — A vision model looks at the post-action screenshot and checks it against the expected result. If behavior was detected but verification fails, the engine assumes it hit the wrong element and retries with a blacklisted selector. When a selector stops working between deploys, the repair loop kicks in. It re-queries the accessibility tree, scores candidate elements against the original target description, and picks the best match. The new selector is cached so the next run is fast. After a run, every step result feeds into a score 0–100 built from: The score maps to a gate decision: ship, caution, or block. You can call the API from CI and fail a deployment if the score drops below your threshold. git clone https://github.com/OaktreeInnovations/confidence-gate.git https://github.com/OaktreeInnovations/confidence-gate.git cd confidence-gate cp .env.example .env make up Open http://localhost:3001 http://localhost:3001 and you're running. We're working on four things in order: The repo is MIT licensed and open to contributions. If any of this is interesting to you — especially the browser recording or the AI execution engine — come say hi on GitHub.