Add Qwen3 Reranker model#3958

Open

ayush1298 wants to merge 13 commits intoembeddings-benchmark:mainfrom

ayush1298:add_qwen_reranker

Collaborator

ayush1298 commented Jan 17, 2026 •

edited

Loading

closes #3718
Added 3 models:

Qwen3-Reranker-0.6B
Qwen3-Reranker-4B
Qwen3-Reranker-8B

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.
The model is public, i.e., is available either as an API or the weights are publicly available to download


          Add Qwen3 Reranker model

f61c025

Copilot AI review requested due to automatic review settings

January 17, 2026 14:59

Copilot started reviewing on behalf of ayush1298

January 17, 2026 15:00

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR adds support for three Qwen3 Reranker models (0.6B, 4B, and 8B variants) to the MTEB framework. The models are reranker models based on Qwen3 that can be used for relevance scoring tasks.

Changes:

Added Qwen3RerankerWrapper class to load and run Qwen3 reranker models using causal language modeling with yes/no token probability scoring
Added three ModelMeta configurations for Qwen3-Reranker-0.6B, Qwen3-Reranker-4B, and Qwen3-Reranker-8B
Imported ScoringFunction from model_meta module to support metadata configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved


          correct Metadata

acd8f13

Samoed reviewed

View reviewed changes

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated

+                  similarity_fn_name=ScoringFunction.COSINE,
+                  use_instructions=True,
+                  training_datasets=qwen3_reranker_training_data,
+                  adapted_from=None,

Member

Samoed Jan 17, 2026

Suggested change

      
                adapted_from=None,
          
                adapted_from="Qwen/Qwen3-4B",

Collaborator Author

ayush1298 Jan 17, 2026 •

edited

Loading

Why this?

mteb/models/model_implementations/rerankers_custom.py Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated

Comment on lines +222 to +225

+                      torch_dtype=torch.float32,
+                      attn_implementation: str | None = None,
+                      batch_size: int = 32,
+                      max_length: int = 8192,

Member

Samoed Jan 17, 2026

This shouldn't be passed to model initialization

Suggested change

      
                    torch_dtype=torch.float32,
          
                    attn_implementation: str | None = None,
          
                    batch_size: int = 32,
          
                    max_length: int = 8192,

Collaborator Author

ayush1298 Jan 17, 2026

we can keep attn_implementation in init, right? will remove rest

mteb/models/model_implementations/rerankers_custom.py Outdated Show resolved Hide resolved

mteb/models/model_implementations/rerankers_custom.py Outdated

+                      self.token_false_id = self.tokenizer.convert_tokens_to_ids("no")
+                      self.token_true_id = self.tokenizer.convert_tokens_to_ids("yes")
+                      self.prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'

Member

Samoed Jan 17, 2026

I don't think that instruction should be hardcoded

Collaborator Author

ayush1298 Jan 17, 2026

These is given on their hf page.

mteb/models/model_implementations/rerankers_custom.py Outdated

+                      queries = [text for batch in inputs1 for text in batch["query"]]
+                      instructions = None
+                      if "instruction" in inputs2.dataset.features:
+                          instructions = [text for batch in inputs1 for text in batch["instruction"]]

Member

Samoed Jan 17, 2026

Can you get task specific prompt? Instruction from batch will be only instruction retrieval/reranking tasks

Member

Samoed commented Jan 17, 2026

By the way you cat get implementation from their repo https://github.com/QwenLM/Qwen3-Embedding/blob/main/evaluation/qwen3_reranker_model.py

Collaborator Author

ayush1298 commented Jan 17, 2026 •

edited

Loading

By the way you cat get implementation from their repo https://github.com/QwenLM/Qwen3-Embedding/blob/main/evaluation/qwen3_reranker_model.py

It's almost the same, but it uses vLLM. Should I use it?

Member

Samoed commented Jan 17, 2026

I don't think you need to change to vllm. I think it's better to use transformers or sentence transforemsrs, but I think this model is not compatible with sentence transformers

Collaborator Author

ayush1298 commented Jan 17, 2026 •

edited

Loading

I don't think you need to change to vllm. I think it's better to use transformers or sentence transforemsrs, but I think this model is not compatible with sentence transformers

Their script on github is using vllm, and yes its not compatible with Sentence Transformers, so I think we can keep the current transformers implementation.

ayush1298 added 2 commits

January 17, 2026 20:54


          Create new file for Qwen3 reranker

fd98533


          made suggested changes

76054bf

Collaborator Author

ayush1298 commented Jan 18, 2026

@Samoed I was able to run this code perfectly. Just 1 doubt is, for evaluation they have given these in their modelcard:

Evaluation results for reranking models. We use the retrieval subsets of MTEB(eng, v2), MTEB(cmn, v1), MMTEB and MTEB (Code), which are MTEB-R, CMTEB-R, MMTEB-R and MTEB-Code.

So, how can do evaluation on retrival subset only?

Member

Samoed commented Jan 18, 2026

We have example in docs https://embeddings-benchmark.github.io/mteb/usage/selecting_tasks/#filtering-benchmark-tasks

Collaborator Author

ayush1298 commented Jan 18, 2026

We have example in docs https://embeddings-benchmark.github.io/mteb/usage/selecting_tasks/#filtering-benchmark-tasks

import mteb
model_name="Qwen/Qwen3-Reranker-4B"
revision="f16fc5d5d2b9b1d0db8280929242745d79794ef5"
model = mteb.get_model(model_name)
benchmark = mteb.get_benchmark("MTEB(Code, v1)")

# Filter to only retrieval tasks
retrieval_tasks = mteb.filter_tasks(benchmark, task_types=["Retrieval"])
print(f"Found {len(retrieval_tasks)} retrieval tasks")
results = mteb.evaluate(model, tasks=retrieval_tasks)

Getting these, when trying to running above code, is it again because we allow only retrieval and not reranking one?
ValueError: CrossEncoder search requires top_ranked documents for reranking.

Member

Samoed commented Jan 18, 2026

Yes that's right. I think you can try to evaluate on their script on some reranking tasks and after that check your implementation

Collaborator Author

ayush1298 commented Jan 18, 2026

Yes that's right. I think you can try to evaluate on their script on some reranking tasks and after that check your implementation

Their script are using retrieval results in reranking.

Evaluate reranking models section in readme.md

Member

Samoed commented Jan 18, 2026

I think you can still run reranking tasks

Collaborator Author

ayush1298 commented Jan 18, 2026

I think you can still run reranking tasks

I am not able to run their code, getting an error because of I think conflict in dependencies.

Samoed added reranking new model labels

Collaborator Author

ayush1298 commented Jan 22, 2026 •

edited

Loading

@Samoed Could you try running it if possible? I tried it again. But not able to run it fully.

Contributor

github-actions bot commented Feb 20, 2026

This pull request has been automatically marked as stale due to inactivity.

github-actions bot added the stale label


          Merge branch 'main' into add_qwen_reranker

ce976b0

Member

Samoed commented Feb 20, 2026

I run Qwen3 on FollowIR and got 0.0435, while reported 5.41, but this is very hard to reproduce, because they're doing reranking from Qwen3-embeddings, which we can't reproduce fully All scores are our runs based on the retrieval top-100 results from the first row.

Samoed added 2 commits

February 20, 2026 21:17


          Merge branch 'main' into add_qwen_reranker

93caedd


          fix implementation

24643b9

github-actions bot removed the stale label

Collaborator Author

ayush1298 commented Feb 21, 2026

I run Qwen3 on FollowIR and got 0.0435, while reported 5.41, but this is very hard to reproduce, because they're doing reranking from Qwen3-embeddings, which we can't reproduce fully All scores are our runs based on the retrieval top-100 results from the first row.

What should we do exactly here in that case? Should we ask someone from the Qwen team to just check the implementation?

Member

Samoed commented Feb 21, 2026

We have issue about this #2907

Samoed reviewed

View reviewed changes

mteb/models/model_implementations/qwen3_reranker.py

Comment on lines +49 to +50

		self.prefix = '<\|im_start\|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<\|im_end\|>\n<\|im_start\|>user\n'
		self.suffix = "<\|im_end\|>\n<\|im_start\|>assistant\n<think>\n\n</think>\n\n"

Member

Samoed Feb 21, 2026

I revisited qwen3 implementation. Can we use apply_chat_template instead to not hardcode this? https://github.com/QwenLM/Qwen3-Embedding/blob/44548aa5f0a0aed1c76d64e19afe47727a325b8f/evaluation/qwen3_reranker_model.py#L54-L62

Collaborator Author

ayush1298 Feb 21, 2026

There is a difference between what is in the github and on thier hf

So, in github implementation, they were using hardcoded suffix_tokens only, not prefix_tokens and also using apply_chat_template. But, on hf, they were using both hardcoded prefix_tokens and suffix_tokens, but not using apply_chat_template

Member

Samoed Feb 21, 2026 •

edited

Loading

Apply chat template would output the same

tokenizer.apply_chat_template(text, tokenize=False, add_generation_prompt=True, enable_thinking=False)
# '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: instruction\n\n<Query>: query\n\n<Document>: doc<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n'

Also they add suffix later https://github.com/QwenLM/Qwen3-Embedding/blob/44548aa5f0a0aed1c76d64e19afe47727a325b8f/evaluation/qwen3_reranker_model.py#L69C50-L69C63

Member

Samoed commented Feb 21, 2026

Also I think we can add sampling parameters https://github.com/QwenLM/Qwen3-Embedding/blob/44548aa5f0a0aed1c76d64e19afe47727a325b8f/evaluation/qwen3_reranker_model.py#L44-L49


          Merge branch 'main' into add_qwen_reranker

cb08949

ayush1298 commented

View reviewed changes

mteb/models/model_implementations/qwen3_reranker.py Outdated

Comment on lines +72 to +76

+                          - len(self.prefix_tokens)
+                          - len(self.suffix_tokens),
+                      )
+                      for i, ele in enumerate(inputs["input_ids"]):
+                          inputs["input_ids"][i] = self.prefix_tokens + ele + self.suffix_tokens

Collaborator Author

ayush1298 Feb 22, 2026

@Samoed We are using that prefix_tokens here? So, if we want to add tokenizer.apply_chat_template, then what to do with this? Also, on github this implementation is different

Member

Samoed Feb 22, 2026

You just change tokenizer to process

text = [
            {"role": "system", "content": "Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\"."},
            {"role": "user", "content": f"<Instruct>: {instruction}\n\n<Query>: {query}\n\n<Document>: {doc}"}
        ]
inputs = tokenizer.apply_chat_template(text)
inputs["input_ids"][i] = ele + self.suffix_tokens

Collaborator Author

ayush1298 Feb 24, 2026

https://github.com/QwenLM/Qwen3-Embedding/blob/44548aa5f0a0aed1c76d64e19afe47727a325b8f/evaluation/qwen3_reranker_model.py#L66

Here, apply_chat_template is applied to pairs which are after passing through format_instructions. So, shouldn't we have to do same thing?


          Add embedding parameters to all models and fix citation

e80cee5

Samoed reviewed

View reviewed changes

mteb/models/model_implementations/qwen3_reranker.py Outdated Show resolved Hide resolved

ayush1298 added 2 commits

February 24, 2026 20:12


          Added configuration parameters

7a17c91


          Change flag of sampling

9e6ea81

ayush1298 commented

View reviewed changes

mteb/models/model_implementations/qwen3_reranker.py Show resolved Hide resolved

ayush1298 added 2 commits

February 26, 2026 01:17


          Update format_instructions

df7b442


          make lint

ea32738

Collaborator Author

ayush1298 commented Mar 10, 2026

@Samoed Can you check this one? Should we merge it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model reranking