Add Qwen3 Reranker model#3958
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for three Qwen3 Reranker models (0.6B, 4B, and 8B variants) to the MTEB framework. The models are reranker models based on Qwen3 that can be used for relevance scoring tasks.
Changes:
- Added
Qwen3RerankerWrapperclass to load and run Qwen3 reranker models using causal language modeling with yes/no token probability scoring - Added three
ModelMetaconfigurations for Qwen3-Reranker-0.6B, Qwen3-Reranker-4B, and Qwen3-Reranker-8B - Imported
ScoringFunctionfrom model_meta module to support metadata configuration
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| similarity_fn_name=ScoringFunction.COSINE, | ||
| use_instructions=True, | ||
| training_datasets=qwen3_reranker_training_data, | ||
| adapted_from=None, |
There was a problem hiding this comment.
| adapted_from=None, | |
| adapted_from="Qwen/Qwen3-4B", |
| torch_dtype=torch.float32, | ||
| attn_implementation: str | None = None, | ||
| batch_size: int = 32, | ||
| max_length: int = 8192, |
There was a problem hiding this comment.
This shouldn't be passed to model initialization
| torch_dtype=torch.float32, | |
| attn_implementation: str | None = None, | |
| batch_size: int = 32, | |
| max_length: int = 8192, |
There was a problem hiding this comment.
we can keep attn_implementation in init, right? will remove rest
| self.token_false_id = self.tokenizer.convert_tokens_to_ids("no") | ||
| self.token_true_id = self.tokenizer.convert_tokens_to_ids("yes") | ||
|
|
||
| self.prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n' |
There was a problem hiding this comment.
I don't think that instruction should be hardcoded
There was a problem hiding this comment.
These is given on their hf page.
| queries = [text for batch in inputs1 for text in batch["query"]] | ||
| instructions = None | ||
| if "instruction" in inputs2.dataset.features: | ||
| instructions = [text for batch in inputs1 for text in batch["instruction"]] |
There was a problem hiding this comment.
Can you get task specific prompt? Instruction from batch will be only instruction retrieval/reranking tasks
|
By the way you cat get implementation from their repo https://github.com/QwenLM/Qwen3-Embedding/blob/main/evaluation/qwen3_reranker_model.py |
It's almost the same, but it uses vLLM. Should I use it? |
|
I don't think you need to change to vllm. I think it's better to use transformers or sentence transforemsrs, but I think this model is not compatible with sentence transformers |
Their script on github is using vllm, and yes its not compatible with Sentence Transformers, so I think we can keep the current transformers implementation. |
|
@Samoed I was able to run this code perfectly. Just 1 doubt is, for evaluation they have given these in their modelcard:
So, how can do evaluation on retrival subset only? |
|
We have example in docs https://embeddings-benchmark.github.io/mteb/usage/selecting_tasks/#filtering-benchmark-tasks |
Getting these, when trying to running above code, is it again because we allow only retrieval and not reranking one? |
|
Yes that's right. I think you can try to evaluate on their script on some reranking tasks and after that check your implementation |
Their script are using retrieval results in reranking. Evaluate reranking models section in readme.md |
|
I think you can still run reranking tasks |
I am not able to run their code, getting an error because of I think conflict in dependencies. |
|
@Samoed Could you try running it if possible? I tried it again. But not able to run it fully. |
|
This pull request has been automatically marked as stale due to inactivity. |
|
I run Qwen3 on FollowIR and got |
What should we do exactly here in that case? Should we ask someone from the Qwen team to just check the implementation? |
|
We have issue about this #2907 |
| self.prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n' | ||
| self.suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n" |
There was a problem hiding this comment.
I revisited qwen3 implementation. Can we use apply_chat_template instead to not hardcode this? https://github.com/QwenLM/Qwen3-Embedding/blob/44548aa5f0a0aed1c76d64e19afe47727a325b8f/evaluation/qwen3_reranker_model.py#L54-L62
There was a problem hiding this comment.
There is a difference between what is in the github and on thier hf
So, in github implementation, they were using hardcoded suffix_tokens only, not prefix_tokens and also using apply_chat_template. But, on hf, they were using both hardcoded prefix_tokens and suffix_tokens, but not using apply_chat_template
There was a problem hiding this comment.
Apply chat template would output the same
tokenizer.apply_chat_template(text, tokenize=False, add_generation_prompt=True, enable_thinking=False)
# '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n<Instruct>: instruction\n\n<Query>: query\n\n<Document>: doc<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n'Also they add suffix later https://github.com/QwenLM/Qwen3-Embedding/blob/44548aa5f0a0aed1c76d64e19afe47727a325b8f/evaluation/qwen3_reranker_model.py#L69C50-L69C63
|
Also I think we can add sampling parameters https://github.com/QwenLM/Qwen3-Embedding/blob/44548aa5f0a0aed1c76d64e19afe47727a325b8f/evaluation/qwen3_reranker_model.py#L44-L49 |
| - len(self.prefix_tokens) | ||
| - len(self.suffix_tokens), | ||
| ) | ||
| for i, ele in enumerate(inputs["input_ids"]): | ||
| inputs["input_ids"][i] = self.prefix_tokens + ele + self.suffix_tokens |
There was a problem hiding this comment.
@Samoed We are using that prefix_tokens here? So, if we want to add tokenizer.apply_chat_template, then what to do with this? Also, on github this implementation is different
There was a problem hiding this comment.
You just change tokenizer to process
text = [
{"role": "system", "content": "Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\"."},
{"role": "user", "content": f"<Instruct>: {instruction}\n\n<Query>: {query}\n\n<Document>: {doc}"}
]
inputs = tokenizer.apply_chat_template(text)
inputs["input_ids"][i] = ele + self.suffix_tokensThere was a problem hiding this comment.
Here, apply_chat_template is applied to pairs which are after passing through format_instructions. So, shouldn't we have to do same thing?
|
@Samoed Can you check this one? Should we merge it? |
closes #3718
Added 3 models:
mteb.get_model(model_name, revision)andmteb.get_model_meta(model_name, revision)