{"slug": "native-binary-embeddings-experiment-curious-about-your-thoughts", "title": "Native binary embeddings experiment: curious about your thoughts", "summary": "A developer tested native binary embedding training against post-hoc binarization using a small BERT-mini model and found that native training with a binary loss produced better retrieval results on SciFact Recall@10. The experiment used CPU-only training on a Mac Mini M4 Pro with NLI 550k pairs over 3 epochs, and the binary model converged due to three key techniques. The results suggest a 2048-dimension sweet spot, and the developer is seeking community feedback on larger-scale experiments.", "body_md": "I spent a few days testing a simple hypothesis: does training a binary embedding model natively (with a binary loss) produce better retrieval than just binarizing a float model post-hoc?\n\nThe setup is deliberately small : bert-mini (~11M params), CPU-only training on a Mac Mini M4 Pro, NLI 550k pairs, 3 epochs.\n\nKey results on SciFact Recall@10:\n\nAnd at 1M vectors with FAISS (AVX2+POPCNT on x86):\n\nThe three things that made the binary model actually converge:\n\nModels and code on GitHub / HuggingFace (korben99/bne-binary-2048).\n\nHappy to hear if you’ve seen similar or contradictory results, especially at larger scales or with bigger backbones. Also curious whether the 2048-dim sweet spot holds with e.g. MiniLM.", "url": "https://wpnews.pro/news/native-binary-embeddings-experiment-curious-about-your-thoughts", "canonical_source": "https://discuss.huggingface.co/t/native-binary-embeddings-experiment-curious-about-your-thoughts/177107#post_1", "published_at": "2026-06-23 15:34:41+00:00", "updated_at": "2026-06-24 00:41:52.899894+00:00", "lang": "en", "topics": ["machine-learning", "natural-language-processing", "ai-research", "ai-tools"], "entities": ["FAISS", "SciFact", "BERT-mini", "Mac Mini M4 Pro", "NLI", "HuggingFace", "GitHub", "korben99"], "alternates": {"html": "https://wpnews.pro/news/native-binary-embeddings-experiment-curious-about-your-thoughts", "markdown": "https://wpnews.pro/news/native-binary-embeddings-experiment-curious-about-your-thoughts.md", "text": "https://wpnews.pro/news/native-binary-embeddings-experiment-curious-about-your-thoughts.txt", "jsonld": "https://wpnews.pro/news/native-binary-embeddings-experiment-curious-about-your-thoughts.jsonld"}}