{"slug": "i-trained-a-neural-network-to-break-my-own-encrypted-search-it-learned-nothing", "title": "I trained a neural network to break my own encrypted search. It learned nothing.", "summary": "A developer built ZATRON, a system that transforms document embeddings into modular barcodes to enable encrypted semantic search without revealing the original vectors. When the developer trained a neural network to recover similarity from the barcodes using 80,000 labeled pairs, the attack achieved exactly chance-level performance (AUC 0.505), while the same network nearly perfectly recovered similarity from unprotected signals (AUC 0.999). ZATRON sacrifices some retrieval recall (81% vs 100% for the classic ASPE scheme) but eliminates leakage that ASPE exposes, where an observer can directly read similarities with ρ = +0.87.", "body_md": "A few months ago I built a way to search documents by meaning while keeping the embeddings hidden — even from the server doing the search. I called it ZATRON.\n\nThe obvious question everyone (including me) kept asking was: *does it actually hide anything, or does it just look scrambled?*\n\nScrambled-looking isn't the same as secure. So instead of trusting a correlation number, I did the thing that actually scares me: I trained a neural network to break it.\n\nThis post is the honest write-up — including the part where I tried hard to make the attack win.\n\nStandard semantic search stores embeddings as plain vectors. Anyone with database access can cluster them by topic and infer content without reading a word. ZATRON transforms each embedding into a **modular barcode**: project onto PCA channels, quantize, add a per-document keyed mask, and keep only residues modulo a set of primes. You compare barcodes in modular space; the original embedding is never reconstructed.\n\nRetrieval still works — 98% of cosine quality on 626K MSMARCO passages. The question is whether the barcodes leak.\n\nMy first security check was a Spearman correlation between barcode distance and true similarity. It came out near zero (ρ ≈ 0.05). Good — but a low *linear* correlation only rules out a *simple* attacker. A neural network doesn't need linearity. It can learn whatever structure is there.\n\nSo the real test: give a neural network every advantage and see if it can recover similarity from the barcodes.\n\nI used a **known-plaintext** attacker — the strongest realistic setting:\n\nAnd the part that makes the result trustworthy: I ran the **identical attack on the unprotected quantized signals** as a control. If the attack can't break those, the attack is too weak and the test means nothing.\n\nOn 50,000 MSMARCO passages, 100,000 labeled pairs:\n\n| Input the attacker sees | Linear probe | MLP (3-layer) |\n|---|---|---|\n| Unprotected signals (control) | ρ = 0.79, AUC = 0.985 | ρ = 0.90, AUC = 0.999 |\n| ZATRON barcodes | ρ = 0.00, AUC = 0.498 | ρ = 0.00, AUC = 0.505 |\n\nThe same network that recovers similarity from unprotected signals *almost perfectly* (AUC 0.999) gets **exactly chance level** on the barcodes — with 80,000 labeled pairs to learn from. AUC 0.50 is a coin flip.\n\nIt learned nothing.\n\n\"8x faster than FHE\" is a weak flex — everyone knows FHE is slow. The fairer comparison is **ASPE** (Wong et al., SIGMOD 2009), the classic encrypted-kNN scheme. ASPE preserves scalar products exactly, so retrieval is perfect — but that same property means any observer can read similarities straight off the ciphertexts.\n\n| ASPE (SIGMOD '09) | ZATRON | |\n|---|---|---|\n| Retrieval recall@10 (strict) | 100% | 81% |\n| Observer reads similarity directly | ρ = +0.87 |\nρ = −0.06 |\n| Learned attack (MLP) | ρ = +0.91, AUC = 0.99 |\nρ = +0.01, AUC = 0.52 |\n\nASPE buys perfect recall with total leakage. ZATRON gives up a margin on the strictest retrieval metric and leaks nothing — to a direct observer *or* a trained network.\n\nHonesty is the whole point, so the limits:\n\nEverything is reproducible:\n\n```\npip install zatron\n```\n\nThe attack and the ASPE comparison are in the repo as runnable scripts (`benchmarks/`\n\n). If you can make the neural attack win — train it longer, give it more pairs, better features — I genuinely want to see it. Finding the weakness is the point.\n\nI'd rather have someone break this now than after I've claimed too much.", "url": "https://wpnews.pro/news/i-trained-a-neural-network-to-break-my-own-encrypted-search-it-learned-nothing", "canonical_source": "https://dev.to/zahraarmantech/i-trained-a-neural-network-to-break-my-own-encrypted-search-it-learned-nothing-55f3", "published_at": "2026-06-11 23:27:29+00:00", "updated_at": "2026-06-11 23:42:30.926725+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "ai-research", "ai-safety", "ai-tools"], "entities": ["ZATRON", "MSMARCO"], "alternates": {"html": "https://wpnews.pro/news/i-trained-a-neural-network-to-break-my-own-encrypted-search-it-learned-nothing", "markdown": "https://wpnews.pro/news/i-trained-a-neural-network-to-break-my-own-encrypted-search-it-learned-nothing.md", "text": "https://wpnews.pro/news/i-trained-a-neural-network-to-break-my-own-encrypted-search-it-learned-nothing.txt", "jsonld": "https://wpnews.pro/news/i-trained-a-neural-network-to-break-my-own-encrypted-search-it-learned-nothing.jsonld"}}