I trained a neural network to break my own encrypted search. It learned nothing.

wpnews.pro

cd /news/machine-learning/i-trained-a-neural-network-to-break-… · home › topics › machine-learning › article

[ARTICLE · art-24600] src=dev.to ↗ pub=2026-06-11T23:27Z topic=machine-learning verified=true sentiment=· neutral

I trained a neural network to break my own encrypted search. It learned nothing.

A developer built ZATRON, a system that transforms document embeddings into modular barcodes to enable encrypted semantic search without revealing the original vectors. When the developer trained a neural network to recover similarity from the barcodes using 80,000 labeled pairs, the attack achieved exactly chance-level performance (AUC 0.505), while the same network nearly perfectly recovered similarity from unprotected signals (AUC 0.999). ZATRON sacrifices some retrieval recall (81% vs 100% for the classic ASPE scheme) but eliminates leakage that ASPE exposes, where an observer can directly read similarities with ρ = +0.87.

read3 min views22 publishedJun 11, 2026

A few months ago I built a way to search documents by meaning while keeping the embeddings hidden — even from the server doing the search. I called it ZATRON.

The obvious question everyone (including me) kept asking was: does it actually hide anything, or does it just look scrambled?

Scrambled-looking isn't the same as secure. So instead of trusting a correlation number, I did the thing that actually scares me: I trained a neural network to break it.

This post is the honest write-up — including the part where I tried hard to make the attack win.

Standard semantic search stores embeddings as plain vectors. Anyone with database access can cluster them by topic and infer content without reading a word. ZATRON transforms each embedding into a modular barcode: project onto PCA channels, quantize, add a per-document keyed mask, and keep only residues modulo a set of primes. You compare barcodes in modular space; the original embedding is never reconstructed.

Retrieval still works — 98% of cosine quality on 626K MSMARCO passages. The question is whether the barcodes leak.

My first security check was a Spearman correlation between barcode distance and true similarity. It came out near zero (ρ ≈ 0.05). Good — but a low linear correlation only rules out a simple attacker. A neural network doesn't need linearity. It can learn whatever structure is there.

So the real test: give a neural network every advantage and see if it can recover similarity from the barcodes.

I used a known-plaintext attacker — the strongest realistic setting:

And the part that makes the result trustworthy: I ran the identical attack on the unprotected quantized signals as a control. If the attack can't break those, the attack is too weak and the test means nothing.

On 50,000 MSMARCO passages, 100,000 labeled pairs:

Input the attacker sees	Linear probe	MLP (3-layer)
Unprotected signals (control)	ρ = 0.79, AUC = 0.985	ρ = 0.90, AUC = 0.999
ZATRON barcodes	ρ = 0.00, AUC = 0.498	ρ = 0.00, AUC = 0.505

The same network that recovers similarity from unprotected signals almost perfectly (AUC 0.999) gets exactly chance level on the barcodes — with 80,000 labeled pairs to learn from. AUC 0.50 is a coin flip.

It learned nothing.

"8x faster than FHE" is a weak flex — everyone knows FHE is slow. The fairer comparison is ASPE (Wong et al., SIGMOD 2009), the classic encrypted-kNN scheme. ASPE preserves scalar products exactly, so retrieval is perfect — but that same property means any observer can read similarities straight off the ciphertexts.

ASPE (SIGMOD '09)	ZATRON
Retrieval recall@10 (strict)	100%	81%
Observer reads similarity directly	ρ = +0.87
ρ = −0.06
Learned attack (MLP)	ρ = +0.91, AUC = 0.99
ρ = +0.01, AUC = 0.52

ASPE buys perfect recall with total leakage. ZATRON gives up a margin on the strictest retrieval metric and leaks nothing — to a direct observer or a trained network.

Honesty is the whole point, so the limits:

Everything is reproducible:

pip install zatron

The attack and the ASPE comparison are in the repo as runnable scripts (benchmarks/

). If you can make the neural attack win — train it longer, give it more pairs, better features — I genuinely want to see it. Finding the weakness is the point.

I'd rather have someone break this now than after I've claimed too much.

source & further reading

dev.to — original article MCP Servers Are Bringing Live SEO Data to AI Keyword Research Workflows The Most Enduring Skills of a Software Engineer Scoring Documents Against a Content Model Without an LLM

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-trained-a-neural-netwo…

Read original on dev.to → dev.to/zahraarmantech/i-trained-a-neural-network…

mentioned entities

ZATRON

MSMARCO

metadata

slugi-trained-a-neural-network-to-break-my-own-encrypted-search-it-learned-nothing

topic#machine-learning

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevWhy AI's tokenmaxxing obsession …

next →Letters: History’s troubling tra…

── more in #machine-learning 4 stories · sorted by recency

discuss.huggingface.co · 3 Jul · #machine-learning

Seeking arXiv cs.AI endorsement – Independent Researcher

runtimewire.com · 28 Jul · #machine-learning

OpenAI calls for an international body that can slow frontier AI development

cyberscoop.com · 28 Jul · #machine-learning

Here’s what Anthropic found when it turned Mythos loose on encryption algorithms

decrypt.co · 28 Jul · #machine-learning

Claude Mythos Cracked Post-Quantum Cryptography That Humans Spent Years Failing to Break

── more on @zatron 3 stories trending now

wpnews · 26 Jul · #artificial-intelligence

Nobel laureate Simon Johnson on the AI race and China’s ‘over-automation’ problem

wpnews · 26 Jul · #artificial-intelligence

China’s Moonshot, Z.AI, and DeepSeek are challenging U.S. AI labs—and beating them on cost

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required