cd /news/ai-tools/huggingface-text-embeddings-inferenc… · home topics ai-tools article
[ARTICLE · art-38311] src=discuss.huggingface.co ↗ pub= topic=ai-tools verified=true sentiment=↓ negative

Huggingface/text-embeddings-inference, cpu bug

A developer reported a CPU bug in Hugging Face's text-embeddings-inference tool, causing accuracy issues during concurrent embedding tasks. The bug, related to attention mask handling for equal-length batches, was submitted with a pull request for a fix.

read1 min views1 publishedJun 24, 2026

I would like to draw your attention to this issue I recently posted to Github. Qwen3/Gemma3 candle skip attention masks for equal-length batches · Issue #882 · huggingface/text-embeddings-inference · GitHub

I also included a PR to fix the issue and throughly tested it on my machines.

When I am using CPU mode for Embeddings and have conccurency (yes, it is slow) this causes large accuracy issues.

── more in #ai-tools 4 stories · sorted by recency
── more on @hugging face 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/huggingface-text-emb…] indexed:0 read:1min 2026-06-24 ·