Sweeping i18n leaks with four parallel AI agents — from 300 candidates down to 60 real bugs

A developer used four parallel AI investigation agents and AST-based false-positive filtering to detect hardcoded Japanese text in a bilingual codebase, reducing 300 candidates to 60 real bugs. The cleanup uncovered that English-paying users had been receiving Japanese emails from the Stripe webhook for months. The approach combined parallel AI agents for different code areas with a Python script using the AST module to filter out false positives from language branches.

For any app past a certain size that's gone bilingual, the question "how much hardcoded Japanese is still hiding in our repo?" never quite goes away. A naive grep for ぁ-んァ-ヶ一-龯 returns thousands of hits, and the vast majority are inside translation tables, already-branched code, or comments. The real leaks are buried. For one cleanup pass we attacked this with four parallel AI investigation agents plus AST-based false-positive filtering . The result: ~300 candidates detected → ~60 real leaks → cleaned up across five rounds . This post walks through the flow and the most interesting bug it uncovered — paying English users had been getting Japanese email from the Stripe webhook for months. A repository-wide grep returns thousands of hits, but the contents fall into four bins: translation tables / already branched by lang == 'en' / comments and docstrings / real leaks . The first three are harmless. Only the last shows Japanese to English users. The trouble is that grep can't separate them, and the volume is too high for a human to triage one by one. The approach: launch AI investigation agents in parallel with each one assigned a different surface area . Agent 1 templates/ .html + lang/ .json — data-i18n attribute gaps Agent 2 server/wpmm-license/ .php — license API Agent 3 server/wpmm-web/ .php — landing-page API Agent 4 core/ .py + tools/ .py — desktop app code Each agent gets the same prompt template — "enumerate user-facing JP hardcodes, decide as best you can whether each is already branched" — and runs independently . Parallelism keeps wall-clock time below a single-agent run, and having four perspectives on the same kind of problem improves coverage. The merged report came in around 300 candidates . Still noisy. Hidden in those 300 were heavy false-positive clusters: | Location | Count | Why it's a false positive | |---|---|---| templates/tos.html | 63 | tosJa / tosEn blocks both exist; switchLang toggles them | core/report generator.py | 141 | All inside if lang == 'en' branches or JA / EN variant maps | Going through 200 items by hand wasn't realistic. Instead, we wrote a Python script using the ast module to mechanically decide "does this function have a lang branch around the JP literal?" A sketch: python import ast def has lang branch func node : """Does this function use lang in a conditional?""" for node in ast.walk func node : if isinstance node, ast.If : for sub in ast.walk node.test : if isinstance sub, ast.Name and sub.id == 'lang': return True return False def has jp literal func node : """Any Constant string node containing Japanese characters?""" for node in ast.walk func node : if isinstance node, ast.Constant and isinstance node.value, str : if any '぀' <= c <= '鿿' for c in node.value : return True return False A real leak = has JP literal AND no lang branch real leaks = f for f in functions if has jp literal f and not has lang branch f Running this against the 141 in report generator.py gave essentially zero real leaks the one residual hit was a docstring false positive . The 63 in tos.html were also fully cleared by checking DOM structure + the presence of switchLang . Net: about 60 real leaks , finally a tractable pile. Inside those 60 was the largest single impact: all four Stripe-webhook emails purchase complete, renewal, payment failed, plan change were hardcoded to Japanese . English-paying users had been getting purchase confirmations, failure notices, everything in Japanese. The kind of bug that quietly persists forever unless you go looking for it. The fix was a one-function language inference from the Stripe event: / Infer display language from Stripe event currency. / function lang from currency string $currency : string { $en currencies = 'usd' ; return in array strtolower $currency , $en currencies, true ? 'en' : 'ja'; } This $lang then gets passed into send license email / send payment failed email / send plan changed email / send renewal email , branching the subject and body, and switching mb language 'uni'|'Japanese' so English subjects are UTF-8 Base64 encoded instead of ISO-2022-JP. Subject encoding is small but real: mb language 'Japanese' was MIME-encoding English subjects in ISO-2022-JP, which raises spam scores on Gmail and Outlook.On the license API side, we consolidated all language detection into one helper: // server/wpmm-license/lib/i18n helpers.php function resolve request lang ?array $body = null : string { if isset $body 'language' && in array $body 'language' , 'ja','en' , true { return $body 'language' ; } // Accept-Language fallback if preg match '/^en\b/i', $ SERVER 'HTTP ACCEPT LANGUAGE' ?? '' { return 'en'; } return 'ja'; } validate.php / release machine.php / webhook.php / verify email.php now all require once this and call resolve request lang instead of rolling their own. An English plan-name table PLAN NAMES EN lives in the same file, so plan name $code, $lang becomes the single point of truth. The remaining real leaks were similar in shape: core/license.py , core/key perms.py , the desktop launchers launcher.sh / .ps1 , and the landing-page APIs checkout.php / chat.php / rate.php . All got the same treatment — extract a small helper, branch on language, route everything through one entry point. Three principles worth keeping from this round: lang from currency , resolve request lang , plan name makes adding a new API naturally route through the same path. The "uh, I forgot to branch" failure mode becomes structurally harderThe fear "how much Japanese is still hardcoded in our repo?" doesn't fully go away — but with a parallel-agents + AST pipeline in your toolkit, you can at least quantify it on demand instead of carrying it as a vague anxiety.