I would like to draw your attention to this issue I recently posted to Github. Qwen3/Gemma3 candle skip attention masks for equal-length batches · Issue #882 · huggingface/text-embeddings-inference · GitHub
I also included a PR to fix the issue and throughly tested it on my machines.
When I am using CPU mode for Embeddings and have conccurency (yes, it is slow) this causes large accuracy issues.