cd /news/artificial-intelligence/sweeping-i18n-leaks-with-four-parall… · home topics artificial-intelligence article
[ARTICLE · art-42816] src=dev.to ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

Sweeping i18n leaks with four parallel AI agents — from 300 candidates down to 60 real bugs

A developer used four parallel AI investigation agents and AST-based false-positive filtering to detect hardcoded Japanese text in a bilingual codebase, reducing 300 candidates to 60 real bugs. The cleanup uncovered that English-paying users had been receiving Japanese emails from the Stripe webhook for months. The approach combined parallel AI agents for different code areas with a Python script using the AST module to filter out false positives from language branches.

read4 min views1 publishedJun 29, 2026

For any app past a certain size that's gone bilingual, the question "how much hardcoded Japanese is still hiding in our repo?" never quite goes away. A naive grep for [ぁ-んァ-ヶ一-龯]

returns thousands of hits, and the vast majority are inside translation tables, already-branched code, or comments. The real leaks are buried.

For one cleanup pass we attacked this with four parallel AI investigation agents plus AST-based false-positive filtering. The result: ~300 candidates detected → ~60 real leaks → cleaned up across five rounds. This post walks through the flow and the most interesting bug it uncovered — paying English users had been getting Japanese email from the Stripe webhook for months.

A repository-wide grep returns thousands of hits, but the contents fall into four bins: translation tables / already branched by lang == 'en' / comments and docstrings / real leaks. The first three are harmless. Only the last shows Japanese to English users. The trouble is that grep can't separate them, and the volume is too high for a human to triage one by one.

The approach: launch AI investigation agents in parallel with each one assigned a different surface area.

[Agent 1] templates/*.html + lang/*.json    — data-i18n attribute gaps
[Agent 2] server/wpmm-license/*.php          — license API
[Agent 3] server/wpmm-web/*.php               — landing-page API
[Agent 4] core/*.py + tools/*.py             — desktop app code

Each agent gets the same prompt template — "enumerate user-facing JP hardcodes, decide as best you can whether each is already branched" — and runs independently. Parallelism keeps wall-clock time below a single-agent run, and having four perspectives on the same kind of problem improves coverage.

The merged report came in around 300 candidates. Still noisy.

Hidden in those 300 were heavy false-positive clusters:

Location Count Why it's a false positive
templates/tos.html
63
tosJa / tosEn blocks both exist; switchLang toggles them
core/report_generator.py
141 All inside if lang == 'en' branches or _JA / _EN variant maps

Going through 200 items by hand wasn't realistic. Instead, we wrote a Python script using the ast

module to mechanically decide "does this function have a lang branch around the JP literal?" A sketch:

import ast

def has_lang_branch(func_node):
    """Does this function use `lang` in a conditional?"""
    for node in ast.walk(func_node):
        if isinstance(node, ast.If):
            for sub in ast.walk(node.test):
                if isinstance(sub, ast.Name) and sub.id == 'lang':
                    return True
    return False

def has_jp_literal(func_node):
    """Any Constant string node containing Japanese characters?"""
    for node in ast.walk(func_node):
        if isinstance(node, ast.Constant) and isinstance(node.value, str):
            if any('぀' <= c <= '鿿' for c in node.value):
                return True
    return False

real_leaks = [f for f in functions
              if has_jp_literal(f) and not has_lang_branch(f)]

Running this against the 141 in report_generator.py

gave essentially zero real leaks (the one residual hit was a docstring false positive). The 63 in tos.html

were also fully cleared by checking DOM structure + the presence of switchLang

.

Net: about 60 real leaks, finally a tractable pile.

Inside those 60 was the largest single impact: all four Stripe-webhook emails (purchase complete, renewal, payment failed, plan change) were hardcoded to Japanese. English-paying users had been getting purchase confirmations, failure notices, everything in Japanese. The kind of bug that quietly persists forever unless you go looking for it.

The fix was a one-function language inference from the Stripe event:

/** Infer display language from Stripe event currency. */
function lang_from_currency(string $currency): string {
    $en_currencies = ['usd'];
    return in_array(strtolower($currency), $en_currencies, true) ? 'en' : 'ja';
}

This $lang

then gets passed into send_license_email

/ send_payment_failed_email

/ send_plan_changed_email

/ send_renewal_email

, branching the subject and body, and switching mb_language('uni'|'Japanese') so English subjects are UTF-8 Base64 encoded instead of ISO-2022-JP. Subject encoding is small but real:

mb_language('Japanese')

was MIME-encoding English subjects in ISO-2022-JP, which raises spam scores on Gmail and Outlook.On the license API side, we consolidated all language detection into one helper:

// server/wpmm-license/lib/i18n_helpers.php
function resolve_request_lang(?array $body = null): string {
    if (isset($body['language']) && in_array($body['language'], ['ja','en'], true)) {
        return $body['language'];
    }
    // Accept-Language fallback
    if (preg_match('/^en\b/i', $_SERVER['HTTP_ACCEPT_LANGUAGE'] ?? '')) {
        return 'en';
    }
    return 'ja';
}

validate.php

/ release_machine.php

/ webhook.php

/ verify_email.php

now all require_once

this and call resolve_request_lang()

instead of rolling their own. An English plan-name table (PLAN_NAMES_EN

) lives in the same file, so plan_name($code, $lang)

becomes the single point of truth.

The remaining real leaks were similar in shape: core/license.py

, core/key_perms.py

, the desktop launchers (_launcher.sh

/ .ps1

), and the landing-page APIs (checkout.php

/ chat.php

/ rate.php

). All got the same treatment — extract a small helper, branch on language, route everything through one entry point.

Three principles worth keeping from this round:

lang_from_currency

, resolve_request_lang

, plan_name

) makes adding a new API naturally route through the same path. The "uh, I forgot to branch" failure mode becomes structurally harderThe fear "how much Japanese is still hardcoded in our repo?" doesn't fully go away — but with a parallel-agents + AST pipeline in your toolkit, you can at least quantify it on demand instead of carrying it as a vague anxiety.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @stripe 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/sweeping-i18n-leaks-…] indexed:0 read:4min 2026-06-29 ·