{"slug": "huggingface-text-embeddings-inference-cpu-bug", "title": "Huggingface/text-embeddings-inference, cpu bug", "summary": "A developer reported a CPU bug in Hugging Face's text-embeddings-inference tool, causing accuracy issues during concurrent embedding tasks. The bug, related to attention mask handling for equal-length batches, was submitted with a pull request for a fix.", "body_md": "I would like to draw your attention to this issue I recently posted to Github. [Qwen3/Gemma3 candle skip attention masks for equal-length batches · Issue #882 · huggingface/text-embeddings-inference · GitHub](https://github.com/huggingface/text-embeddings-inference/issues/882)\n\nI also included a PR to fix the issue and throughly tested it on my machines.\n\nWhen I am using CPU mode for Embeddings and have conccurency (yes, it is slow) this causes large accuracy issues.", "url": "https://wpnews.pro/news/huggingface-text-embeddings-inference-cpu-bug", "canonical_source": "https://discuss.huggingface.co/t/huggingface-text-embeddings-inference-cpu-bug/177129#post_1", "published_at": "2026-06-24 18:53:11+00:00", "updated_at": "2026-06-24 19:17:36.875811+00:00", "lang": "en", "topics": ["ai-tools", "developer-tools", "machine-learning"], "entities": ["Hugging Face", "text-embeddings-inference", "Qwen3", "Gemma3", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/huggingface-text-embeddings-inference-cpu-bug", "markdown": "https://wpnews.pro/news/huggingface-text-embeddings-inference-cpu-bug.md", "text": "https://wpnews.pro/news/huggingface-text-embeddings-inference-cpu-bug.txt", "jsonld": "https://wpnews.pro/news/huggingface-text-embeddings-inference-cpu-bug.jsonld"}}