The VLA Testing Pipeline in Mano-AFK: When AI Agents QA Their Own Work While AI coding tools excel at generating code, they fail to address the critical 70% of software development work involving testing, deployment, and bug fixing. It highlights the "E2E testing gap," where traditional DOM-based testing tools like Selenium are fragile and cannot perform true visual validation. The article introduces Vision-Language-Action (VLA) models as a more robust alternative that operates on pixels rather than selectors, enabling tests to visually interact with applications like a human user would. AI coding tools have gotten remarkably good at generating code. You describe what you want, and within minutes you have functions, components, even entire applications scaffolded out. But there's a question that rarely gets asked in the excitement: who tests it? Writing code accounts for maybe 30% of shipping software. The remaining 70% — defining requirements, deploying, testing, finding bugs, fixing them, and verifying the fixes — is where most projects quietly stall. Every AI coding assistant today stops at some variation of "here's the code, good luck." The developer is still left to deploy it, test it manually, discover the bugs, explain the bugs back to the AI, wait for fixes, and re-test. That workflow isn't autonomous development. It's autocomplete with extra steps. Most engineering teams rely on a layered testing strategy: linting catches syntax errors, unit tests verify individual functions, and API tests confirm that endpoints return the right data. These layers are well-understood, well-automated, and widely adopted. But here's the uncomfortable reality: all three can pass while the application is completely broken for end users. A button's onClick handler might correctly call an API endpoint that returns valid JSON — and the unit test, API test, and linter will all report green. Meanwhile, the button itself is hidden behind a CSS overflow, or renders off-screen on mobile, or navigates to a blank page because the frontend routing is misconfigured. The backend works. The tests pass. The user sees nothing. This is the E2E testing gap. It's the difference between "the code compiles" and "the software ships." And it's the hardest layer to automate, because it requires something most test frameworks don't have: the ability to actually look at the application and interact with it the way a human would. Tools like Selenium and Playwright have been the go-to for browser-based E2E testing for years. They work by programmatically controlling a browser through DOM selectors — clicking elements by their CSS class, filling inputs by their HTML id, asserting text content by XPath. The problem is fragility. DOM-based selectors break whenever the UI changes. A designer renames a class, a framework update restructures the component tree, a developer switches from a