Show HN: TakoQA – A harness to get a swarm of agents to break your application TakoQA, a new open-source testing tool, uses a swarm of LLM-driven browser agents to autonomously break web applications by executing plain-language missions. The tool observes pages visually, decides actions via an LLM, and reports bugs with screenshots, video, and replay data, enabling functional, exploratory, UX, and regression testing without product-specific knowledge. A swarm of browser agents that breaks your web app before your users do. Plain-language missions in, real bugs out. takoqa drives a real Chromium browser against your running app, perceives each page the way a person does, and decides its next action with an LLM — clicking, typing, uploading, and exploring toward a goal you describe in plain language. Along the way it watches for broken behavior and reports what it finds, with screenshots, a video, and a step-by-step replay. The engine knows nothing about any specific product. Everything app-specific lives in a single profile file, so pointing takoqa at a new app is just writing a new profiles/ .yaml . Each step runs a four-beat loop: Observe — tag every visible interactive element with a ref number, plus a screenshot and the page text. Decide — the LLM is given that list and the screenshot and picks one human action, addressing elements by ref — never by CSS selector. Act — Playwright performs the action; the target is highlighted on-page first so the recording shows exactly what was clicked. Check — captured console errors, uncaught exceptions, and HTTP responses run through the oracles. A finding is raised when something looks broken. At the end of each mission an LLM judge decides whether the user's goal was actually met and flags UX/quality issues even when the flow technically worked. Functional bugs — JS exceptions, 5xx responses, console errors, crash text. Exploratory/edge cases — give it a goal and no script; it wanders. UX/quality — the judge flags confusing or degraded flows. Regressions — every run is saved JSON, screenshots, video, trace for run-to-run comparison. takoqa gets smarter the more it runs, without anyone editing the profile: Known-bugs baseline --baseline classifies each finding new / known / muted so a repeat run reports only what changed. Learned store — during --loop the harness distills durable app facts from what it saw routes that turned out to be gated, controls that never did anything, what each page actually offers, missions already tried into a per-profile JSON sidecar. The next run merges the confident subset into the app map it hands the acting agent, so it stops re-discovering the same things. Facts need ≥2 sightings to count and decay if not re-seen, so a one-off flake never ossifies. Learnings inform the agent only — never the judge.marks a finding a known non-bug. It is dropped from the report and the CI gate, and the reason is fed to the LLM judge as a "do not flag" exclusion next run — so a triaged non-bug stops coming back. The reason is the --mute "