Native binary embeddings experiment: curious about your thoughts

A developer tested native binary embedding training against post-hoc binarization using a small BERT-mini model and found that native training with a binary loss produced better retrieval results on SciFact Recall@10. The experiment used CPU-only training on a Mac Mini M4 Pro with NLI 550k pairs over 3 epochs, and the binary model converged due to three key techniques. The results suggest a 2048-dimension sweet spot, and the developer is seeking community feedback on larger-scale experiments.

I spent a few days testing a simple hypothesis: does training a binary embedding model natively with a binary loss produce better retrieval than just binarizing a float model post-hoc? The setup is deliberately small : bert-mini ~11M params , CPU-only training on a Mac Mini M4 Pro, NLI 550k pairs, 3 epochs. Key results on SciFact Recall@10: And at 1M vectors with FAISS AVX2+POPCNT on x86 : The three things that made the binary model actually converge: Models and code on GitHub / HuggingFace korben99/bne-binary-2048 . Happy to hear if you’ve seen similar or contradictory results, especially at larger scales or with bigger backbones. Also curious whether the 2048-dim sweet spot holds with e.g. MiniLM.