cd /news/machine-learning/native-binary-embeddings-experiment-… · home topics machine-learning article
[ARTICLE · art-36838] src=discuss.huggingface.co ↗ pub= topic=machine-learning verified=true sentiment=· neutral

Native binary embeddings experiment: curious about your thoughts

A developer tested native binary embedding training against post-hoc binarization using a small BERT-mini model and found that native training with a binary loss produced better retrieval results on SciFact Recall@10. The experiment used CPU-only training on a Mac Mini M4 Pro with NLI 550k pairs over 3 epochs, and the binary model converged due to three key techniques. The results suggest a 2048-dimension sweet spot, and the developer is seeking community feedback on larger-scale experiments.

read1 min views2 publishedJun 23, 2026

I spent a few days testing a simple hypothesis: does training a binary embedding model natively (with a binary loss) produce better retrieval than just binarizing a float model post-hoc?

The setup is deliberately small : bert-mini (~11M params), CPU-only training on a Mac Mini M4 Pro, NLI 550k pairs, 3 epochs.

Key results on SciFact Recall@10:

And at 1M vectors with FAISS (AVX2+POPCNT on x86): The three things that made the binary model actually converge:

Models and code on GitHub / HuggingFace (korben99/bne-binary-2048). Happy to hear if you’ve seen similar or contradictory results, especially at larger scales or with bigger backbones. Also curious whether the 2048-dim sweet spot holds with e.g. MiniLM.

── more in #machine-learning 4 stories · sorted by recency
── more on @faiss 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/native-binary-embedd…] indexed:0 read:1min 2026-06-23 ·