Huggingface/text-embeddings-inference, cpu bug

A developer reported a CPU bug in Hugging Face's text-embeddings-inference tool, causing accuracy issues during concurrent embedding tasks. The bug, related to attention mask handling for equal-length batches, was submitted with a pull request for a fix.

I would like to draw your attention to this issue I recently posted to Github. Qwen3/Gemma3 candle skip attention masks for equal-length batches · Issue 882 · huggingface/text-embeddings-inference · GitHub https://github.com/huggingface/text-embeddings-inference/issues/882 I also included a PR to fix the issue and throughly tested it on my machines. When I am using CPU mode for Embeddings and have conccurency yes, it is slow this causes large accuracy issues.