{"slug": "introducing-the-ettin-reranker-family", "title": "Introducing the Ettin Reranker Family", "summary": "The article announces the release of six new Sentence Transformers CrossEncoder reranker models, ranging from 17 million to 1 billion parameters, which are built on Ettin ModernBERT encoders and achieve state-of-the-art performance at their respective sizes. These rerankers, designed for the retrieve-then-rerank pattern in information retrieval, were trained using a distillation recipe and are available for use with just three lines of code via the Sentence Transformers library. The release also includes the full training data and recipe, along with a new agent skill for fine-tuning such models.", "body_md": "Text Ranking • 0.1B • Updated • 680 • 1\n\n# Introducing the Ettin Reranker Family\n\n[Update on GitHub](https://github.com/huggingface/blog/blob/main/ettin-reranker.md)\n\nTL;DR\n\nToday I'm releasing six new [Sentence Transformers](https://sbert.net/) CrossEncoder rerankers, state-of-the-art at their respective sizes, built on top of the [Ettin](https://huggingface.co/collections/jhu-clsp/encoders-vs-decoders-the-ettin-suite) ModernBERT encoders, together with the data and full training recipe that produced them:\n\n`cross-encoder/ettin-reranker-17m-v1`\n\n`cross-encoder/ettin-reranker-32m-v1`\n\n`cross-encoder/ettin-reranker-68m-v1`\n\n`cross-encoder/ettin-reranker-150m-v1`\n\n`cross-encoder/ettin-reranker-400m-v1`\n\n`cross-encoder/ettin-reranker-1b-v1`\n\nThe models were trained with a **distillation recipe**: pointwise MSE on [ mixedbread-ai/mxbai-rerank-large-v2](https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v2) scores over\n\n[, which is a subset of](https://huggingface.co/datasets/cross-encoder/ettin-reranker-v1-data)\n\n`cross-encoder/ettin-reranker-v1-data`\n\n[mixed with a reranked subset of](https://huggingface.co/datasets/lightonai/embeddings-pre-training)\n\n`lightonai/embeddings-pre-training`\n\n[.](https://huggingface.co/datasets/lightonai/embeddings-fine-tuning)\n\n`lightonai/embeddings-fine-tuning`\n\n*Our six rerankers paired with *\n\n`google/embeddinggemma-300m`\n\non MTEB(eng, v2) Retrieval. See Results for five more embedder pairings.If you're new to rerankers and want the \"why\" first, jump to [What is a reranker, and why pair one with an embedder?](#what-is-a-reranker-and-why-pair-one-with-an-embedder). If you just want to plug a model in, jump to [Usage](#usage). If you want to train your own, jump to [Training](#training).\n\nI bootstrapped the training recipe below with the new\n\n[shipped in]`train-sentence-transformers`\n\nAgent Skill[Sentence Transformers v5.5.0]. Install it with`hf skills add train-sentence-transformers [--global] [--claude]`\n\nand ask your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, ...) to fine-tune a`SentenceTransformer`\n\n,`CrossEncoder`\n\n, or`SparseEncoder`\n\nmodel on your data.\n\n## Table of contents\n\n[What is a reranker, and why pair one with an embedder?](#what-is-a-reranker-and-why-pair-one-with-an-embedder)[Usage](#usage)[Architecture Details](#architecture-details)[Results](#results)[Training](#training)[Conclusion](#conclusion)[Acknowledgements](#acknowledgements)\n\n## What is a reranker, and why pair one with an embedder?\n\nA reranker (a.k.a. pointwise cross-encoder) is a neural model that takes a `(query, document)`\n\npair and outputs a single relevance score. Unlike an embedding model, which encodes the query and document separately and computes their similarity from the two embedding vectors, a reranker lets the two texts attend to each other through every transformer layer. That joint encoding is more accurate but also more expensive: the model has to be run once per `(query, document)`\n\npair rather than once per text.\n\nBecause cross-encoders are too expensive to run over a full corpus, the common production pattern is **retrieve-then-rerank**: a fast embedding model retrieves the top-K candidates (cheap), then a cross-encoder re-orders just those K with high accuracy. The total cost stays bounded while the final ranking is much closer to what an exhaustive cross-encoder pass would produce.\n\nThroughout this blogpost I'll use \"reranker\" and \"cross-encoder\" interchangeably.\n\n## Usage\n\nThe released models are normal Sentence Transformers `CrossEncoder`\n\nmodels, so you can use them with just 3 lines of code:\n\n``` python\nfrom sentence_transformers import CrossEncoder\n\nmodel = CrossEncoder(\"cross-encoder/ettin-reranker-32m-v1\")\nscores = model.predict([\n    (\"Where was Apple founded?\", \"Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.\"),\n    (\"Where was Apple founded?\", \"The Fuji apple is an apple cultivar developed in the late 1930s and brought to market in 1962.\"),\n])\nprint(scores)\n# [11.393298  2.968891]   <- larger means more relevant\n```\n\nFor a query and a list of candidates, you can also use `rank`\n\nto get back sorted indices and scores:\n\n```\nranked = model.rank(\n    query=\"Which planet is known as the Red Planet?\",\n    documents=[\n        \"Venus is often called Earth's twin because of its similar size and proximity.\",\n        \"Mars, known for its reddish appearance, is often referred to as the Red Planet.\",\n        \"Jupiter, the largest planet in our solar system, has a prominent red spot.\",\n        \"Saturn, famous for its rings, is sometimes mistaken for the Red Planet.\",\n    ],\n    top_k=4,\n    return_documents=True,\n)\nfor r in ranked:\n    print(f\"({r['score']:.2f}): {r['text']}\")\n# (10.82): Mars, known for its reddish appearance, is often referred to as the Red Planet.\n# (9.86): Saturn, famous for its rings, is sometimes mistaken for the Red Planet.\n# (8.55): Jupiter, the largest planet in our solar system, has a prominent red spot.\n# (6.21): Venus is often called Earth's twin because of its similar size and proximity.\n```\n\nYou can swap [ cross-encoder/ettin-reranker-32m-v1](https://huggingface.co/cross-encoder/ettin-reranker-32m-v1) for any other size to trade quality for speed. All six accept up to 8K tokens of context (useful for long-document reranking) thanks to ModernBERT's long-context pre-training.\n\nIt is recommended to install [ kernels](https://github.com/huggingface/kernels) and set\n\n`model_kwargs={\"dtype\": \"bfloat16\", \"attn_implementation\": \"flash_attention_2\"}`\n\nfor the highest throughput. See the [Speed](#speed)section below for more details, but in general you can expect a 1.7x-8.3x speedup over default loading depending on model size and sequence length.\n\n``` python\nfrom sentence_transformers import CrossEncoder\n\nmodel = CrossEncoder(\n    \"cross-encoder/ettin-reranker-32m-v1\",\n    model_kwargs={\"dtype\": \"bfloat16\", \"attn_implementation\": \"flash_attention_2\"},\n)\n```\n\n### End-to-end retrieve-then-rerank pipeline\n\nA complete example with a fast embedder for retrieval and the reranker for the final ordering:\n\n``` python\nfrom sentence_transformers import SentenceTransformer, CrossEncoder\n\n# Fast retrieval with a static embedder (sub-millisecond on CPU per query)\nembedder = SentenceTransformer(\"sentence-transformers/static-retrieval-mrl-en-v1\")\nreranker = CrossEncoder(\"cross-encoder/ettin-reranker-68m-v1\")\n\ncorpus = [\n    \"Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.\",\n    \"The Fuji apple is an apple cultivar developed in the late 1930s.\",\n    \"Steve Jobs introduced the iPhone in 2007 at Macworld.\",\n    \"Macintosh computers were sold by Apple from 1984 onward.\",\n    # ... thousands or millions more in production\n]\nquery = \"Where was Apple founded?\"\n\n# Step 1: encode + retrieve top-100\nquery_emb = embedder.encode_query(query, convert_to_tensor=True)\ncorpus_emb = embedder.encode_document(corpus, convert_to_tensor=True)\nscores = embedder.similarity(query_emb, corpus_emb)[0]\ntop_k_idx = scores.topk(min(100, len(corpus))).indices.tolist()\n\n# Step 2: rerank\ntop_k_docs = [corpus[i] for i in top_k_idx]\nranked = reranker.rank(query, top_k_docs, top_k=5, return_documents=True)\nfor r in ranked:\n    print(f\"({r['score']:.2f}): {r['text']}\")\n# (11.63): Apple Inc. was founded in Cupertino, California in 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.\n# (4.71): Steve Jobs introduced the iPhone in 2007 at Macworld.\n# (1.96): The Fuji apple is an apple cultivar developed in the late 1930s.\n# (1.49): Macintosh computers were sold by Apple from 1984 onward.\n```\n\nThis is the same shape used by most modern search systems. The retriever decides what enters the funnel, the reranker decides what wins.\n\n## Architecture Details\n\nAll six rerankers share the same architecture and differ only in their backbone size. The backbone is one of the six [Ettin encoders](https://huggingface.co/blog/ettin) from Johns Hopkins University's Ettin suite. These are ModernBERT-style models with unpadded attention, RoPE positional encodings, GeGLU, and 2T tokens of open-license pre-training, supporting up to 8192 tokens of context.\n\nOn top of each encoder, the reranker uses a 4-module classification head that mirrors `ModernBertForSequenceClassification`\n\nbut is built from Sentence Transformers' modular components. The underlying `Transformer`\n\nis a plain `AutoModel`\n\nrather than `AutoModelForSequenceClassification`\n\n, which lets us use sequence unpadding for variable-length inputs for Flash Attention 2. At medium-document sequence lengths this is a 1.7x-8.3x speedup over fp32+SDPA depending on model size (see [Speed](#speed) for the full benchmark):\n\n```\n1. Transformer(FA2)\n2. Pooling(cls)\n3. Dense(H, H, bias=False, GELU)\n4. LayerNorm(H)\n5. Dense(H, 1, scores)\n```\n\nIn my ablations, CLS pooling outperformed mean pooling. That was a little surprising. ModernBERT uses global attention only every third layer and the other two-thirds use local-window attention that cannot reach CLS from distant positions. Empirically, those few global layers carry enough signal to make CLS the better pooling choice.\n\n| Model | Backbone | Hidden size | Layers | Params (head incl.) |\n|---|---|---|---|---|\n`cross-encoder/ettin-reranker-17m-v1` |\n\n`jhu-clsp/ettin-encoder-17m`\n\n`cross-encoder/ettin-reranker-32m-v1`\n\n`jhu-clsp/ettin-encoder-32m`\n\n`cross-encoder/ettin-reranker-68m-v1`\n\n`jhu-clsp/ettin-encoder-68m`\n\n`cross-encoder/ettin-reranker-150m-v1`\n\n`jhu-clsp/ettin-encoder-150m`\n\n`cross-encoder/ettin-reranker-400m-v1`\n\n`jhu-clsp/ettin-encoder-400m`\n\n`cross-encoder/ettin-reranker-1b-v1`\n\n`jhu-clsp/ettin-encoder-1b`\n\nAll six models are released under the **Apache 2.0** license, matching the Ettin encoders.\n\n## Results\n\n### MTEB(eng, v2) Retrieval\n\nI ran each released model through the full [ MTEB(eng, v2) Retrieval benchmark](https://github.com/embeddings-benchmark/mteb) (10 tasks, top-100 reranked) using MTEB's\n\n[two-stage reranking flow](https://embeddings-benchmark.github.io/mteb/get_started/advanced_usage/two_stage_reranking/), pairing each reranker with six embedding models that span the speed/quality spectrum:\n\n| Embedding Model | Active params | Retriever-only NDCG@10 |\n|---|---|---|\n`sentence-transformers/static-retrieval-mrl-en-v1` |\n\n`sentence-transformers/all-MiniLM-L6-v2`\n\n`BAAI/bge-small-en-v1.5`\n\n`nomic-ai/nomic-embed-text-v1.5`\n\n`google/embeddinggemma-300m`\n\n`jinaai/jina-embeddings-v5-text-small-retrieval`\n\nThe **dashed retriever-only line** in each chart below is the headline number to beat. Anything below it means the reranker actively hurts the pipeline on average:\n\n## Full table of results (click to expand)\n\nMean NDCG@10 over the 6 embedder pairings, sorted descending. Our six models are in **bold**, and the teacher [ mixedbread-ai/mxbai-rerank-large-v2](https://huggingface.co/mixedbread-ai/mxbai-rerank-large-v2) is underlined.\n\n† Capped to `max_seq_length=8192`\n\n(the 4B Qwen3-based rerankers don't fit on a single H100 80GB at native context). Native-context evaluation is likely higher.\n\n## Full table of NanoBEIR results (click to expand)\n\n[NanoBEIR](https://huggingface.co/collections/sentence-transformers/nanobeir-with-bm25-rankings) is a fast 13-dataset subset of [BEIR](https://github.com/beir-cellar/beir) that uses 50 queries per dataset against up to 5000 documents each. NanoBEIR is what `metric_for_best_model`\n\nwas set to during training (see [Evaluation](#evaluation)), and what I used to guide the experimentation.\n\n| Reranker | Params | NanoBEIR mean NDCG@10 |\n|---|---|---|\n`mixedbread-ai/mxbai-rerank-large-v2` |\n\n`cross-encoder/ettin-reranker-1b-v1`\n\n**1.00B****0.7237**`jinaai/jina-reranker-m0`\n\n`cross-encoder/ettin-reranker-400m-v1`\n\n**401M****0.7193**`mixedbread-ai/mxbai-rerank-base-v2`\n\n`cross-encoder/ettin-reranker-150m-v1`\n\n**151M****0.7086**`Alibaba-NLP/gte-reranker-modernbert-base`\n\n`BAAI/bge-reranker-v2-m3`\n\n`cross-encoder/ettin-reranker-68m-v1`\n\n**68.6M****0.6915**`ibm-granite/granite-embedding-reranker-english-r2`\n\n`cross-encoder/ettin-reranker-32m-v1`\n\n**32.8M****0.6825**`cross-encoder/ettin-reranker-17m-v1`\n\n**17.6M****0.6746**`mixedbread-ai/mxbai-rerank-large-v1`\n\n`BAAI/bge-reranker-large`\n\n`cross-encoder/ms-marco-MiniLM-L12-v2`\n\n`cross-encoder/ms-marco-MiniLM-L6-v2`\n\n`cross-encoder/ms-marco-MiniLM-L4-v2`\n\n`mixedbread-ai/mxbai-rerank-base-v1`\n\n`mixedbread-ai/mxbai-rerank-xsmall-v1`\n\n`BAAI/bge-reranker-base`\n\nThe smallest model I'm releasing, our 17M, beats the 33M `ms-marco-MiniLM-L12-v2`\n\nby +0.051 NDCG@10 (0.5576 vs 0.5066) on MTEB and +0.038 (0.6746 vs 0.6369) on NanoBEIR at roughly half the parameter count. The 32M beats the 568M `BAAI/bge-reranker-v2-m3`\n\nby +0.025 (0.5779 vs 0.5526) on MTEB, a 17x parameter gap. If you've been using one of the legacy MiniLM rerankers as the default in your retrieve-then-rerank stack, swapping in our 17M (or 32M) is a low-risk drop-in replacement, with a noticeable quality bump on both benchmarks.\n\nMoving up the table, our 150M is the strongest reranker I tested in the under-600M range on MTEB, edging out the recent `Qwen/Qwen3-Reranker-0.6B`\n\n(596M) by +0.005 (0.5994 vs 0.5940) and beating every BAAI bge-reranker variant by 0.03 to 0.05. The 68M is also worth a mention: at 0.5915 it lands almost exactly on `Qwen3-Reranker-0.6B`\n\n(0.5940) while using a ninth of the parameters.\n\nAt the top of the released range, our 1B model closely tracks its teacher. It comes within 0.0001 of the 1.54B `mxbai-rerank-large-v2`\n\non MTEB (0.6114 vs 0.6115) and within 0.008 on NanoBEIR, despite distilling from a model 54% larger than itself. The distillation effectively closes the gap to the teacher, which is what I was hoping to see going into this release.\n\nThe overall strongest reranker in the comparison is `Qwen/Qwen3-Reranker-4B`\n\nat 0.6367 MTEB, +0.025 above our 1B model. Closing that gap from the current recipe would likely require distilling from a stronger teacher (our teacher itself sits below `Qwen3-Reranker-4B`\n\n). For most retrieve-then-rerank workloads, our 1B at a quarter of the parameters (see [Speed](#speed)) is a much more practical pick.\n\n### Speed\n\nQuality numbers are only half of what matters for a reranker. The other half is whether its latency fits inside the budget you have between retrieval and showing results to the user. Let me walk through what I measured.\n\nI benchmarked all six released models against thirteen public rerankers (strong baselines up to about 1B parameters) on a single NVIDIA H100 80GB. The queries and documents come from [ sentence-transformers/natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) at its natural document-length distribution: most NQ answers are short, some are long. Documents are truncated at\n\n`max_length=512`\n\nto avoid giving the older models an unfair advantage. Each model uses its best supported attention implementation: Flash Attention 2 wherever the architecture supports it (BERT, XLM-RoBERTa, ModernBERT, Qwen2), SDPA where it doesn't, and eager for DeBERTa-v2 (which currently has neither FA2 nor SDPA support in `transformers`\n\n).For every model an auto-batch search starts at batch size 8 and doubles until the GPU runs out of memory. At each batch size I run three timed passes and keep the median throughput, so a single unlucky run doesn't drag the number around. The reported throughput is at whichever batch size won.\n\n**Table 1.** Throughput in pairs per second, all in `bfloat16`\n\n. Our six rerankers are in **bold**.\n\n| Model | Params | Attn | pairs / second |\n|---|---|---|---|\n`cross-encoder/ettin-reranker-17m-v1` |\n17M |\nFA2 | 7517 |\n`cross-encoder/ettin-reranker-32m-v1` |\n32M |\nFA2 | 6602 |\n`cross-encoder/ettin-reranker-68m-v1` |\n68M |\nFA2 | 4913 |\n`cross-encoder/ms-marco-MiniLM-L4-v2` |\n\n`cross-encoder/ms-marco-MiniLM-L6-v2`\n\n`cross-encoder/ms-marco-MiniLM-L12-v2`\n\n`cross-encoder/ettin-reranker-150m-v1`\n\n**150M****3237**`BAAI/bge-reranker-base`\n\n`mixedbread-ai/mxbai-rerank-xsmall-v1`\n\n`mixedbread-ai/mxbai-rerank-base-v1`\n\n`cross-encoder/ettin-reranker-400m-v1`\n\n**400M****1738**`BAAI/bge-reranker-large`\n\n`BAAI/bge-reranker-v2-m3`\n\n`Alibaba-NLP/gte-reranker-modernbert-base`\n\n`ibm-granite/granite-embedding-reranker-english-r2`\n\n`cross-encoder/ettin-reranker-1b-v1`\n\n**1B****928**`mixedbread-ai/mxbai-rerank-large-v1`\n\n`mixedbread-ai/mxbai-rerank-base-v2`\n\n`mixedbread-ai/mxbai-rerank-large-v2`\n\n__1.5B____387__Our 17M is the fastest reranker in the whole comparison, at 7517 pairs per second. That's almost twice the throughput of `ms-marco-MiniLM-L6-v2`\n\n(3817) and faster even than the smaller `ms-marco-MiniLM-L4-v2`\n\n(4029). And as you saw in the MTEB table earlier, our 17M is also more accurate than every MiniLM variant. If you're currently running a MiniLM cross-encoder, swapping to our 17M is a one-line change that improves both your latency and search quality.\n\nOur 150M is an even more interesting comparison, because there are two direct architectural peers at exactly 150M parameters: [ Alibaba-NLP/gte-reranker-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-reranker-modernbert-base) and\n\n[. Both are built on the same ModernBERT-base backbone. Our 150M runs at 3237 pairs per second, while the two peers come in at 1418 and 1404 respectively, for a 2.3x speed gap.](https://huggingface.co/ibm-granite/granite-embedding-reranker-english-r2)\n\n`ibm-granite/granite-embedding-reranker-english-r2`\n\nAll three 150M models use Flash Attention 2, but the two peers load through `AutoModelForSequenceClassification`\n\n, which keeps the inputs padded. So attention itself runs the FA2 kernel, but the rest of the model is still doing dense compute on padding tokens that don't contribute anything. Our modular `Transformer`\n\nmodule (see [Architecture Details](#architecture-details) above) propagates unpadded inputs all the way through the model, so every layer only spends compute on real tokens. That's the difference between getting some of FA2's benefit and getting all of it.\n\nAt the bottom of the table, our 1B model hits 928 pairs per second, which is 2.4x faster than the 1.54B teacher `mxbai-rerank-large-v2`\n\n(387 pairs per second) while matching its MTEB score within 0.0001. The teacher is Qwen2-based with a prompt-template overhead per pair, so the distilled student inherits the teacher's calibration and judgement but skips all the runtime baggage. This is honestly the most satisfying single number in the whole release for me.\n\nOne unfortunate note: the DeBERTa-v2-based `mxbai-rerank-{xsmall,base,large}-v1`\n\nseries ends up much slower than the rest of the table because DeBERTa-v2 currently supports neither Flash Attention 2 nor SDPA in `transformers`\n\n. The 70M `mxbai-rerank-xsmall-v1`\n\nruns at 2636 pairs per second, about half the throughput of our 68M at almost the same parameter count. The models themselves are perfectly fine, they just don't get to use modern attention kernels.\n\n## Same benchmark on a consumer GPU (RTX 3090, 24 GB)\n\nIf you're self-hosting on a consumer card rather than a datacenter GPU, here's the same throughput sweep on an RTX 3090. Same benchmark setup as Table 1: `bfloat16`\n\n, best-supported attention per model, three-trial median throughput at the largest batch that fits.\n\n| Model | Params | Best attn | pairs / second |\n|---|---|---|---|\n`cross-encoder/ettin-reranker-17m-v1` |\n17M |\nFA2 | 9008 |\n`cross-encoder/ms-marco-MiniLM-L4-v2` |\n\n`cross-encoder/ettin-reranker-32m-v1`\n\n**32M****4497**`cross-encoder/ms-marco-MiniLM-L6-v2`\n\n`cross-encoder/ms-marco-MiniLM-L12-v2`\n\n`cross-encoder/ettin-reranker-68m-v1`\n\n**68M****1916**`mixedbread-ai/mxbai-rerank-xsmall-v1`\n\n`BAAI/bge-reranker-base`\n\n`cross-encoder/ettin-reranker-150m-v1`\n\n**150M****982**`mixedbread-ai/mxbai-rerank-base-v1`\n\n`ibm-granite/granite-embedding-reranker-english-r2`\n\n`Alibaba-NLP/gte-reranker-modernbert-base`\n\n`BAAI/bge-reranker-large`\n\n`BAAI/bge-reranker-v2-m3`\n\n`cross-encoder/ettin-reranker-400m-v1`\n\n**400M****429**`mixedbread-ai/mxbai-rerank-large-v1`\n\n`mixedbread-ai/mxbai-rerank-base-v2`\n\n`cross-encoder/ettin-reranker-1b-v1`\n\n**1B****189**`mixedbread-ai/mxbai-rerank-large-v2`\n\n__1.5B____69__Our 17M is still the fastest model in the table at 9008 pairs per second, actually higher than its H100 number, which suggests that at tiny sizes raw compute isn't the bottleneck and the H100's extra muscle doesn't translate. The middle of the table reshuffles a bit, with the MiniLM rerankers overtaking our 32M and 68M, and the 1B slipping behind `mxbai-rerank-base-v2`\n\n(189 vs 221 pairs per second). Our 150M model still holds a solid lead over the two 150M ModernBERT-based peers, and the teacher-replacement story still holds, with our 1B at 2.7x the throughput of the 1.5B `mxbai-rerank-large-v2`\n\n(189 vs 69 pairs per second).\n\n## Same benchmark on CPU (Intel Core i7-13700K)\n\n| Model | Params | Best attn | pairs / second |\n|---|---|---|---|\n`cross-encoder/ettin-reranker-17m-v1` |\n17M |\nSDPA | 267.4 |\n`cross-encoder/ms-marco-MiniLM-L4-v2` |\n\n`cross-encoder/ms-marco-MiniLM-L6-v2`\n\n`cross-encoder/ettin-reranker-32m-v1`\n\n**32M****92.5**`cross-encoder/ms-marco-MiniLM-L12-v2`\n\n`mixedbread-ai/mxbai-rerank-xsmall-v1`\n\n`cross-encoder/ettin-reranker-68m-v1`\n\n**68M****31.2**`BAAI/bge-reranker-base`\n\n`Alibaba-NLP/gte-reranker-modernbert-base`\n\n`ibm-granite/granite-embedding-reranker-english-r2`\n\n`cross-encoder/ettin-reranker-150m-v1`\n\n**150M****14.0**`mixedbread-ai/mxbai-rerank-base-v1`\n\n`BAAI/bge-reranker-large`\n\n`BAAI/bge-reranker-v2-m3`\n\n`cross-encoder/ettin-reranker-400m-v1`\n\n**400M****5.2**`mixedbread-ai/mxbai-rerank-large-v1`\n\n`mixedbread-ai/mxbai-rerank-base-v2`\n\n`cross-encoder/ettin-reranker-1b-v1`\n\n**1B****2.1** On CPU, we can't take advantage of bf16, Flash Attention 2, or unpadding, so the latency story is a bit simpler: the higher the parameter count, the slower the model. The 17M model is considerably faster than `ms-marco-MiniLM-L6-v2`\n\n(267.4 vs 143.9 pairs per second) and even faster than the smaller `ms-marco-MiniLM-L4-v2`\n\n(206.2). As expected, our 150M model lands alongside the two 150M peers (14.0 vs 14.5 and 14.7 pairs per second) now that unpadding no longer applies. If you're CPU-bound, our 17M and 32M are the practical picks.\n\nTo explain where the speed comes from, the next table sweeps `fp32+SDPA`\n\n, `bf16+SDPA`\n\n, and `bf16+FA2`\n\nfor our six models using the same bench config. The FA2 column is split in two: one with the inputs still padded (what a wrapped model would see) and one with unpadded inputs (what our modular `Transformer`\n\nactually does). The rightmost column is what our models use by default when FA2 is enabled.\n\n**Table 2.** Precision and attention ablation for the six released sizes at `max_length=512`\n\non natural NQ documents. Each cell shows pairs / second with the multiplier relative to `fp32+SDPA`\n\nin parentheses, and peak GPU memory on the second line. The rightmost column (in **bold**) is the configuration our models use by default when FA2 is enabled.\n\n| Model | Params | fp32+SDPA | bf16+SDPA | bf16+FA2 w. padding | bf16+FA2 w.o. padding |\n|---|---|---|---|---|---|\n`cross-encoder/ettin-reranker-17m-v1` |\n\n0.8 GB\n\n2.2 GB\n\n1.9 GB\n\n**7517 (1.71x)****1.4 GB**`cross-encoder/ettin-reranker-32m-v1`\n\n1.2 GB\n\n1.6 GB\n\n2.9 GB\n\n**6602 (2.00x)****1.1 GB**`cross-encoder/ettin-reranker-68m-v1`\n\n1.0 GB\n\n2.2 GB\n\n2.0 GB\n\n**4913 (3.60x)****1.5 GB**`cross-encoder/ettin-reranker-150m-v1`\n\n1.6 GB\n\n1.8 GB\n\n3.1 GB\n\n**3237 (4.83x)****1.4 GB**`cross-encoder/ettin-reranker-400m-v1`\n\n2.5 GB\n\n1.8 GB\n\n2.7 GB\n\n**1738 (6.53x)****2.2 GB**`cross-encoder/ettin-reranker-1b-v1`\n\n4.6 GB\n\n2.8 GB\n\n3.6 GB\n\n**928 (8.26x)****4.5 GB** The total speedup from `bf16+FA2 w.o. padding`\n\nover the `fp32+SDPA`\n\nbaseline grows sharply with model size, from 1.71x on the 17M to 8.26x on the 1B. Most of that growth comes from `bf16`\n\nalone: the `fp32+SDPA`\n\nto `bf16+SDPA`\n\nstep gives the 17M only a 1.03x speedup but gives the 1B a full 5.60x speedup, also due to the lowered memory cost allowing for bigger batch sizes. In short, `bfloat16`\n\nis the biggest single contributor to the overall speedup.\n\nUnexpectedly, turning on FA2 while the inputs are still padded is actually slower than `bf16+SDPA`\n\nat every size in the release. The FA2 kernel prefers an unpadded format, and when you feed it padded inputs you pay the bookkeeping overhead of converting between formats while still spending compute on the padding tokens themselves. So the `bf16+FA2 w. padding`\n\ncolumn is roughly what you'd measure if you swapped `sdpa`\n\nfor `flash_attention_2`\n\nin `model_kwargs`\n\nwithout changing anything else about the model loader. This is the situation that `gte-reranker-modernbert-base`\n\nand `granite-embedding-reranker-english-r2`\n\nfrom Table 1 are in.\n\nLastly, going from `bf16+FA2 w. padding`\n\nto `bf16+FA2 w.o. padding`\n\nis worth between 1.78x (1B) and 2.45x (68M) of additional throughput, and it also cuts peak memory considerably, allowing for higher batch sizes.\n\nSo my recommendation is simple: enable `bf16`\n\nand FA2 together. The six Ettin rerankers will use unpadded inputs by default, since that's what the modular `Transformer`\n\nmodule from the [Architecture Details](#architecture-details) section is set up for. The full snippet is the same as in the [Usage](#usage) section above:\n\n``` python\nfrom sentence_transformers import CrossEncoder\n\nmodel = CrossEncoder(\n    \"cross-encoder/ettin-reranker-150m-v1\",\n    model_kwargs={\n        \"dtype\": \"bfloat16\",\n        \"attn_implementation\": \"flash_attention_2\",  # See tip below\n    },\n)\n```\n\nUse\n\n`pip install kernels`\n\nto install FA2. It ships pre-built kernels for a wide range of GPU architectures, CUDA versions, and operating systems.\n\nOne caveat for other CrossEncoders: the full speedup is only available for models built with a modular `Transformer`\n\nlike the Ettin rerankers. Applying the same two flags to a CrossEncoder that loads through `AutoModelForSequenceClassification`\n\nlands you in the slower `bf16+FA2 w. padding`\n\ncolumn of Table 2 instead.\n\n## Training\n\nThe training script below started as the output of the new [ train-sentence-transformers Agent Skill](https://github.com/huggingface/sentence-transformers/tree/main/skills), shipped in\n\n[Sentence Transformers v5.5.0](https://github.com/huggingface/sentence-transformers/releases/tag/v5.5.0). If you use an AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, ...), you can install the skill and ask it to fine-tune a\n\n`SentenceTransformer`\n\n, `CrossEncoder`\n\n, or `SparseEncoder`\n\nmodel on your data. The skill carries version-aware guidance for base model selection, loss and evaluator choice, hard-negative mining, distillation, LoRA, Matryoshka, multilingual training, and static embeddings, plus template scripts for each model type.\n\n```\nhf skills add train-sentence-transformers --claude   # symlinks into .claude/skills/\nhf skills add train-sentence-transformers --global   # under ~/.agents/skills/\n```\n\nA prompt like *\"Fine-tune a cross-encoder reranker on (query, document) pairs from my dataset, mine hard negatives, and push to my Hub repo\"* will produce a runnable script you can then iterate on. That's how I started working on the recipe below.\n\nAll six rerankers were trained with the same single-stage recipe. Only the learning rate and the per-device batch size vary per model size. The full training script is ~150 lines and uses one published dataset.\n\nThe recipe converged after a single sweep across model sizes. Each size's learning rate was tuned by a small grid search on a ~15% subset of the final training data, and the resulting LRs transferred cleanly to the full-data runs without re-tuning. No per-size tuning beyond LR was needed.\n\n### Distillation recipe\n\nMost published reranker recipes train on human-labeled relevance triples (a query, one positive document, and optionally hard negatives) with a contrastive, pointwise, pairwise, or listwise loss like [ MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#multiplenegativesrankingloss),\n\n[,](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss)\n\n`BinaryCrossEntropyLoss`\n\n[, or](https://sbert.net/docs/package_reference/cross_encoder/losses.html#ranknetloss)\n\n`RankNetLoss`\n\n[, respectively. See my earlier](https://sbert.net/docs/package_reference/cross_encoder/losses.html#lambdaloss)\n\n`LambdaLoss`\n\n[Training and Finetuning Reranker Models with Sentence Transformers](https://huggingface.co/blog/train-reranker)blogpost, for example.\n\nBut this approach has a few practical and theoretical drawbacks. First, positives need to be human-labeled, which is expensive and slow to scale across many domains. Second, the model only ever sees a label for the small subset of `(query, document)`\n\npairs that someone went through. Especially after hard negative mining, you end up with a lot of false negatives, e.g. as shown in [Hard Negatives, Hard Lessons](https://arxiv.org/abs/2505.16967). Third, the binary nature of this labeling doesn't match reality, where some documents are simply more relevant than others.\n\nI took a different route here: pointwise MSE distillation from an existing strong teacher reranker. The setup is simple enough to describe in three lines:\n\n**Teacher**:(1.54B parameters).`mixedbread-ai/mxbai-rerank-large-v2`\n\n**Loss**:on the raw teacher logits (range ~[−12, 22]), i.e. without rescaling.`MSELoss`\n\n**Training data**: ~143M`(query, document, teacher_score)`\n\ntriples.\n\n### Dataset\n\nI've released the training data as a single Hugging Face dataset, [ cross-encoder/ettin-reranker-v1-data](https://huggingface.co/datasets/cross-encoder/ettin-reranker-v1-data), assembled from two sources. Each source is kept as its own split so the provenance is transparent:\n\n- LightOn pre-training data (\n, non-curated): 32 splits covering broad-domain text similarity signal (MTP, FW-EDU, Reddit, PAQ, S2ORC, Amazon, Wikipedia, MS MARCO, etc.). I limit the number of samples for some of the splits, resulting in ~110M`lightonai/embeddings-pre-training`\n\n`(query, document, similarity)`\n\ntriples in total. - Rescored retrieval data from\n: 7 splits (`lightonai/embeddings-fine-tuning`\n\n`msmarco`\n\n,`hotpotqa`\n\n,`trivia`\n\n,`nq`\n\n,`squadv2`\n\n,`fiqa`\n\n,`fever`\n\n). The source dataset has up to 2048 candidate documents per query (initially scored with), which I rescored with`Alibaba-NLP/gte-modernbert-base`\n\nand uploaded as`mixedbread-ai/mxbai-rerank-large-v2`\n\n. That dataset subsamples each query's 2048 candidates down to 256 using the`cross-encoder/lightonai-embeddings-fine-tuning-reranked-v1`\n\n[Jang et al.](https://arxiv.org/abs/2604.04734)quantile-anchor recipe (all positives + top-16 hard + ~239 quantile-anchor stratified). For training, I pick 64 of those 256 per query: 32 from the score-sorted head (the positive plus the hardest negatives) and 32 medium-difficulty negatives sampled from a band further down the teacher's ranking. See the[dataset card](https://huggingface.co/datasets/cross-encoder/ettin-reranker-v1-data)for the exact rank positions.\n\nTotal: ~143M `(query, document, score)`\n\ntriples, plus a held-out 5K-row eval split (the tail of `quora`\n\n) that drives the in-training eval loss.\n\n### Training Arguments\n\nMost hyperparameters are constant across model sizes:\n\n```\nCrossEncoderTrainingArguments(\n    num_train_epochs=1,                    # I chose more data over more epochs\n    per_device_train_batch_size=...,       # global_batch_size // world_size (see table below)\n    gradient_accumulation_steps=1,\n    learning_rate=...,                     # per-size, see table\n    warmup_ratio=0.03,                     # ~3% linear warmup, then linear decay (default)\n    bf16=True,                             # FA2 + bf16 throughout\n    eval_strategy=\"steps\",\n    eval_steps=0.05,                       # NanoBEIR every 5% of training\n    save_strategy=\"steps\",\n    save_steps=0.05,\n    save_total_limit=5,\n    load_best_model_at_end=True,\n    metric_for_best_model=\"eval_NanoBEIR_R100_mean_ndcg@10\",\n    seed=12,\n)\n```\n\nOnly the learning rate and global batch size very per model size.\n\n| Size | Learning rate | Global batch size |\n|---|---|---|\n| 17m | 2.4e-4 | 1024 |\n| 32m | 1.2e-4 | 512 |\n| 68m | 3e-5 | 256 |\n| 150m | 1.5e-5 | 192 |\n| 400m | 7e-6 | 256 |\n| 1b | 3e-6 | 512 |\n\n`global_batch_size`\n\nis `per_device_batch_size x world_size x gradient_accumulation_steps`\n\n. On a single 8-GPU node, the 1024 global batch for 17m means `per_device=128`\n\n. On 8 nodes, it means `per_device=8`\n\n. The training script computes `per_device_batch_size`\n\nfrom `global_batch_size // world_size`\n\nso the same script works at any node count. The global batch size could be made more consistent, but I found that the above values worked well and didn't want to retune them just for the sake of consistency.\n\n### Evaluation\n\nI monitored NanoBEIR mean NDCG@10 during training (eval every 5% of steps) and used it as the `metric_for_best_model`\n\nfor `load_best_model_at_end`\n\n. NanoBEIR is fast, so I could afford it 20 times per training run. After training, I evaluated both the best checkpoint (according to NanoBEIR) and the last checkpoint on the full MTEB(eng, v2) Retrieval benchmark. The final release checkpoint was the one that did best on MTEB. The NanoBEIR-preferred checkpoint won for all sizes except 68m, where the last checkpoint was slightly stronger.\n\n### Overall Training Script\n\nThe complete script (what every released model was trained with) is a single file. Only `ENCODER_SIZE`\n\nchanges per run, and everything else is automatic:\n\n``` python\nfrom __future__ import annotations\n\nimport logging\nimport os\nfrom pathlib import Path\n\nimport torch\nimport torch.nn as nn\nfrom datasets import concatenate_datasets, get_dataset_config_names, load_dataset\n\nfrom sentence_transformers import CrossEncoder\nfrom sentence_transformers.base.modules import Dense\nfrom sentence_transformers.cross_encoder import (\n    CrossEncoderModelCardData,\n    CrossEncoderTrainer,\n    CrossEncoderTrainingArguments,\n)\nfrom sentence_transformers.cross_encoder.evaluation import CrossEncoderNanoBEIREvaluator\nfrom sentence_transformers.cross_encoder.losses import MSELoss\nfrom sentence_transformers.sentence_transformer.modules import LayerNorm, Pooling, Transformer\n\nlogging.basicConfig(level=logging.INFO, format=\"%(asctime)s %(message)s\", datefmt=\"%H:%M:%S\")\nlogging.getLogger(\"httpx\").setLevel(logging.WARNING)\n\n# Per-size config. I swept the learning rates with these global (effective) batch sizes,\n# also by incorporating accum_steps\nCONFIGS: dict[str, dict] = {\n    \"17m\":  {\"base_model_name\": \"jhu-clsp/ettin-encoder-17m\",  \"learning_rate\": 2.4e-4, \"global_batch_size\": 1024},\n    \"32m\":  {\"base_model_name\": \"jhu-clsp/ettin-encoder-32m\",  \"learning_rate\": 1.2e-4, \"global_batch_size\": 512},\n    \"68m\":  {\"base_model_name\": \"jhu-clsp/ettin-encoder-68m\",  \"learning_rate\": 3e-5,   \"global_batch_size\": 256},\n    \"150m\": {\"base_model_name\": \"jhu-clsp/ettin-encoder-150m\", \"learning_rate\": 1.5e-5, \"global_batch_size\": 192},\n    \"400m\": {\"base_model_name\": \"jhu-clsp/ettin-encoder-400m\", \"learning_rate\": 7e-6,   \"global_batch_size\": 256},\n    \"1b\":   {\"base_model_name\": \"jhu-clsp/ettin-encoder-1b\",   \"learning_rate\": 3e-6,   \"global_batch_size\": 512},\n}\nENCODER_SIZE = \"17m\"\n\ndef main() -> None:\n    config = CONFIGS[ENCODER_SIZE]\n    encoder_id = config[\"base_model_name\"]\n    learning_rate = config[\"learning_rate\"]\n    global_batch_size = config[\"global_batch_size\"]\n\n    world_size = int(os.environ.get(\"WORLD_SIZE\", 1))\n    per_device_batch_size = global_batch_size // world_size\n    dataloader_workers = 0 if world_size > 8 else 4\n    run_name = f\"ettin-reranker-{ENCODER_SIZE}-lr{learning_rate:.0e}\"\n\n    # 1. Load a model to finetune with model card data\n    # The model mirrors ModernBertForSequenceClassification, but with a 'headless' Transformer that just loads\n    # AutoModel. This allows for unpadding with FA2, which isn't possible with AutoModelForSequenceClassification.\n    # This speeds up training considerably, while heavily reducing memory usage.\n    torch.manual_seed(12)\n    transformer = Transformer(encoder_id, model_kwargs={\"attn_implementation\": \"flash_attention_2\"})\n    transformer.model.config.num_labels = 1\n    embedding_dimension = transformer.get_embedding_dimension()\n    pooling = Pooling(embedding_dimension=embedding_dimension, pooling_mode=\"cls\")\n    dense_inner = Dense(\n        in_features=embedding_dimension, out_features=embedding_dimension, bias=False,\n        activation_function=nn.GELU(),\n        module_input_name=\"sentence_embedding\", module_output_name=\"sentence_embedding\",\n    )\n    norm = LayerNorm(dimension=embedding_dimension)\n    dense_score = Dense(\n        in_features=embedding_dimension, out_features=1, bias=True,\n        activation_function=nn.Identity(),\n        module_input_name=\"sentence_embedding\", module_output_name=\"scores\",\n    )\n    model = CrossEncoder(\n        modules=[transformer, pooling, dense_inner, norm, dense_score],\n        num_labels=1,\n        activation_fn=nn.Identity(),\n        model_card_data=CrossEncoderModelCardData(\n            model_name=f\"Ettin Reranker {ENCODER_SIZE} distilled from mxbai-rerank-large-v2\",\n            language=\"en\",\n            license=\"apache-2.0\",\n        ),\n    )\n    actual_attn = getattr(model[0].model.config, \"_attn_implementation\", None)\n    if not (actual_attn and \"flash\" in actual_attn.lower()):\n        logging.warning(f\"FA2 may not be active (attn_impl={actual_attn!r}); training will be slower.\")\n\n    # 2. Load the dataset. Each config is one source subset (32 lighton + 7 rerank retrieval\n    # domains). The held-out eval rows live as the 'validation' split of the 'quora' config.\n    dataset_repo = \"cross-encoder/ettin-reranker-v1-data\"\n    train_pieces = []\n    eval_dataset = None\n    for config_name in get_dataset_config_names(dataset_repo):\n        dataset = load_dataset(dataset_repo, config_name)\n        train_pieces.append(dataset[\"train\"])\n        if \"validation\" in dataset:\n            eval_dataset = dataset[\"validation\"]\n    train_dataset = concatenate_datasets(train_pieces)\n    print(train_dataset)\n\n    # 3. Define a loss function\n    loss = MSELoss(model)\n\n    # 4. Specify training arguments\n    args = CrossEncoderTrainingArguments(\n        output_dir=f\"models/{run_name}\",\n        num_train_epochs=1,\n        per_device_train_batch_size=per_device_batch_size,\n        per_device_eval_batch_size=per_device_batch_size,\n        gradient_accumulation_steps=1,\n        learning_rate=learning_rate,\n        warmup_ratio=0.03,\n        bf16=True,\n        eval_strategy=\"steps\",\n        eval_steps=0.05,\n        save_strategy=\"steps\",\n        save_steps=0.05,\n        save_total_limit=5,\n        logging_steps=0.025,\n        logging_first_step=True,\n        load_best_model_at_end=True,\n        metric_for_best_model=\"eval_NanoBEIR_R100_mean_ndcg@10\",\n        dataloader_num_workers=dataloader_workers,\n        run_name=run_name,\n        seed=12,\n    )\n\n    # 5. Create an evaluator\n    evaluator = CrossEncoderNanoBEIREvaluator(\n        dataset_names=[\"msmarco\", \"nfcorpus\", \"nq\", \"fiqa2018\", \"touche2020\", \"scifact\",\n                       \"hotpotqa\", \"arguana\", \"fever\", \"dbpedia\", \"climatefever\", \"scidocs\",\n                       \"quoraretrieval\"],\n        batch_size=per_device_batch_size,\n        always_rerank_positives=False,\n        show_progress_bar=False,\n    )\n\n    # 6. Create a trainer\n    trainer = CrossEncoderTrainer(\n        model=model,\n        args=args,\n        train_dataset=train_dataset,\n        eval_dataset=eval_dataset,\n        loss=loss,\n        evaluator=evaluator,\n    )\n\n    # 7. Evaluate before training\n    if trainer.is_world_process_zero():\n        with torch.autocast(device_type=\"cuda\", dtype=torch.bfloat16):\n            evaluator(model)\n\n    # 8. Train\n    trainer.train()\n\n    # 9. Evaluate the final model\n    if trainer.is_world_process_zero():\n        with torch.autocast(device_type=\"cuda\", dtype=torch.bfloat16):\n            evaluator(model)\n\n    # 10. Save the final model\n    final_dir = f\"models/{run_name}/final\"\n    model.save_pretrained(final_dir)\n\nif __name__ == \"__main__\":\n    main()\n```\n\nFor multi-node training (anything past 17m/32m), launch the same script with `torchrun`\n\n:\n\n```\n# Single-node (17m, 32m): defaults work\npython train.py\n\n# Multi-node 4n setup for 150m, preserves global_batch_size=192:\ntorchrun --nproc_per_node=8 --nnodes=4 ... train.py\n```\n\n## Conclusion\n\nThe ettin-reranker-v1 family, trained with a single simple recipe, is state-of-the-art at every released size up to 1B parameters. Pointwise MSE distillation from a strong teacher onto a broad-domain and retrieval-specific mix scales cleanly from 17M to 1B parameters, with only the learning rate and per-device batch size changing between sizes.\n\nEvery ettin-reranker-v1 model beats the `ms-marco-MiniLM-L*-v2`\n\nfamily by a comfortable margin on MTEB and NanoBEIR. `cross-encoder/ettin-reranker-150m-v1`\n\nis the strongest mid-tier reranker I tested in the under-600M range, `cross-encoder/ettin-reranker-400m-v1`\n\nlands within 0.0024 of the 1.54B teacher's MTEB score, and `cross-encoder/ettin-reranker-1b-v1`\n\nmatches that teacher within 0.0001.\n\nEverything in one place:\n\n**Models**:** Dataset**:with ~143M`cross-encoder/ettin-reranker-v1-data`\n\n`(query, document, label)`\n\ntriples, kept as 39 named splits so the provenance of every row is visible.**Training script**: the ~150 lines in[Overall Training Script](#overall-training-script)above, which is the same script used for all six models.\n\nIf you build something on top of these, please let me know! I'd genuinely love to see what people do with them, and if you manage to train better rerankers using the released data, even better. The recipe is intentionally simple, partly so that there's plenty of headroom for someone else to improve it. Train a stronger teacher and the same script can keep producing better students.\n\n## Acknowledgements\n\nI'd like to thank the Ettin team (Orion Weller, Kathryn Ricci, Marc Marone, Antoine Chaffin, Dawn Lawrie, and Benjamin Van Durme) for [building the base encoders](https://huggingface.co/blog/ettin) that these rerankers are built on, the LightOn team (Antoine Chaffin, Raphael Sourty, Paulo Moura, and Amélie Chatelain) for [their work on the training data collection](https://huggingface.co/blog/lightonai/denseon-lateon), and the Mixedbread AI team (Xianming Li, Aamir Shakir, Rui Huang, Tsz-fung Andrew Lee, Julius Lipp, Benjamin Clavié, and Jing Li) for [their work on the teacher model](https://arxiv.org/abs/2506.03487).\n\n## Citation\n\nIf you use the ettin-reranker-v1 family or any of the released artifacts, please cite this blogpost:\n\n```\n@misc{aarsen2026ettin-reranker,\n    title = \"Introducing the Ettin Reranker Family\",\n    author = \"Aarsen, Tom\",\n    year = \"2026\",\n    publisher = \"Hugging Face\",\n    url = \"https://huggingface.co/blog/ettin-reranker\",\n}\n```\n\n", "url": "https://wpnews.pro/news/introducing-the-ettin-reranker-family", "canonical_source": "https://huggingface.co/blog/ettin-reranker", "published_at": "2026-05-19 00:00:00+00:00", "updated_at": "2026-05-22 15:33:46.822371+00:00", "lang": "en", "topics": ["machine-learning", "artificial-intelligence", "large-language-models", "open-source", "research"], "entities": ["Ettin Reranker Family", "Sentence Transformers", "mixedbread-ai", "lightonai", "Google", "MTEB", "CrossEncoder", "ModernBERT"], "alternates": {"html": "https://wpnews.pro/news/introducing-the-ettin-reranker-family", "markdown": "https://wpnews.pro/news/introducing-the-ettin-reranker-family.md", "text": "https://wpnews.pro/news/introducing-the-ettin-reranker-family.txt", "jsonld": "https://wpnews.pro/news/introducing-the-ettin-reranker-family.jsonld"}}