Native binary embeddings experiment: curious about your thoughts

wpnews.pro

cd /news/machine-learning/native-binary-embeddings-experiment-… · home › topics › machine-learning › article

[ARTICLE · art-36838] src=discuss.huggingface.co ↗ pub=2026-06-23T15:34Z topic=machine-learning verified=true sentiment=· neutral

Native binary embeddings experiment: curious about your thoughts

A developer tested native binary embedding training against post-hoc binarization using a small BERT-mini model and found that native training with a binary loss produced better retrieval results on SciFact Recall@10. The experiment used CPU-only training on a Mac Mini M4 Pro with NLI 550k pairs over 3 epochs, and the binary model converged due to three key techniques. The results suggest a 2048-dimension sweet spot, and the developer is seeking community feedback on larger-scale experiments.

read1 min views2 publishedJun 23, 2026

I spent a few days testing a simple hypothesis: does training a binary embedding model natively (with a binary loss) produce better retrieval than just binarizing a float model post-hoc?

The setup is deliberately small : bert-mini (~11M params), CPU-only training on a Mac Mini M4 Pro, NLI 550k pairs, 3 epochs.

Key results on SciFact Recall@10:

And at 1M vectors with FAISS (AVX2+POPCNT on x86): The three things that made the binary model actually converge:

Models and code on GitHub / HuggingFace (korben99/bne-binary-2048). Happy to hear if you’ve seen similar or contradictory results, especially at larger scales or with bigger backbones. Also curious whether the 2048-dim sweet spot holds with e.g. MiniLM.

source & further reading

discuss.huggingface.co — original article Rakarrack-0.6.1 port making progress! ( AI assisted ) Cloud Storage Poll Welcome to Haiku basic(Haiku Docs, Haiku slide and Haiku sheets)

~/api · this article 200

$curl api.wpnews.pro/v1/news/native-binary-embeddings…

Read original on discuss.huggingface.co → discuss.huggingface.co/t/native-binary-embedding…

mentioned entities

FAISS

SciFact

BERT-mini

Mac Mini M4 Pro

NLI

HuggingFace

GitHub

korben99

metadata

slugnative-binary-embeddings-experiment-curious-about-your-thoughts

topic#machine-learning

secondary3 topics

sentimentneutral

canonicaldiscuss.huggingface.co

navigation

← prevMachines may calculate, but only…

next →Tech sänkte röda Nordenbörser

── more in #machine-learning 4 stories · sorted by recency

dev.to · 25 Jun · #machine-learning

Building a RAG-Based PDF Question Answering System: Engineering Decisions, Failures, and Lessons

macworld.com · 25 Jun · #machine-learning

iOS 27’s Shortcuts is AI at its best

macworld.com · 25 Jun · #machine-learning

I hate AI, but even I can’t wait to try these 10 features on my iPhone

news.ycombinator.com · 25 Jun · #machine-learning

Scott Hanselman (VP at Microsoft/GitHub) just starred my project on GitHub

── more on @faiss 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 24 Jun · #ai-policy

An AI startup is suing the US government for taking away Anthropic's new model

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required