[Feat] CrossEncoder class accept prompt template parameters #3406

BetterAndBetterII · 2025-06-30T11:13:42Z

Summary

Changes: Updated CrossEncoder class to accept prompt template parameters and apply them in prediction and ranking.
Main purpose: Added support for Qwen3Reranker.
Example: Added CrossEncoder example to support dynamic prompt template and default configuration.
Test cases: Added test cases to verify the functionality of prompt template and the correctness of default configuration.

Best Practice

Use Sequence Classification Model https://huggingface.co/tomaarsen/Qwen3-Reranker-0.6B-seq-cls
Modify config.json: add default prompt template like this:

{
  ...
  "sentence_transformers": {
    "version": "xxx",
    "prompt_template": "Instruct: {instruction}\nQuery: {query}\nDocument: {document}",
    "prompt_template_kwargs": {
      "instruction": "Given a query, find the most relevant document."
    }
  },
  ...
}

Simply use model.predict like normal

More customized usage

prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
instruction = "Given a web search query, retrieve relevant passages that answer the query"
query_template = f"{prefix}<Instruct>: {instruction}\n<Query>: {{query}}\n"
document_template = f"<Document>: {{document}}{suffix}"

template = query_template + document_template
template_scores = model.predict(sentence_pairs, prompt_template=template)

or

prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
instruct_template = f"{prefix}<Instruct>: {{instruction}}\n<Query>: {{query}}\n<Document>: {{document}}{suffix}"
instruct_kwargs = {"instruction": "Given a query, find the most relevant document."}

instruction_scores_1 = model.predict(
    sentence_pairs, prompt_template=instruct_template, prompt_template_kwargs=instruct_kwargs
)

Test Cases:

Regression test: ensure that previous functions are not affected
Test of passing prompt_template, passing or not should get different scores
Test of passing prompt_template+prompt_template_kwargs, different prompt_template_kwargs should have different scores
Read the default prompt_template and instruction configuration from config.json, different prompt_template, instruction should have different scores, and no errors will be reported

Test Result:

All Tests in CrossEncoder Passed

Additional Example E2E Demo on Qwen3-Reranker-0.6B:

--Instruction and Correct template is very important to Qwen3-Reranker--

--- 1. Reranking without any template (Incorrect Usage of Qwen3 Reranker) ---
Query: What is the capital of China?
0.9746  The capital of China is Beijing.
0.6800  Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.


--- 2. Reranking with a runtime prompt_template ---
Using template: <|im_start|>system
Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>
<|im_start|>user
<Instruct>: Given a web search query, retrieve relevant passages that answer the query
<Query>: {query}
<Document>: {document}<|im_end|>
<|im_start|>assistant
<think>

</think>


Query: What is the capital of China?
0.9995  The capital of China is Beijing.
0.0000  Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.


--- 3. Reranking with a dynamic instruction ---
Using template: <|im_start|>system
Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>
<|im_start|>user
<Instruct>: {instruction}
<Query>: {query}
<Document>: {document}<|im_end|>
<|im_start|>assistant
<think>

</think>


With instruction 1: 'Given a query, find the most relevant document.'
0.9976  The capital of China is Beijing.
0.0000  Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.

With instruction 2: 'Given a question, find the incorrect answer.'
0.9921  The capital of China is Beijing.
0.0001  Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.

Misc:

This feature conflicts with model.default_prompt_name, and whether it can be integrated or not is still under discussion. But I am very interested in it
Qwen Official integration with sentence-transformer: Ref: https://github.com/QwenLM/Qwen3-Embedding/blob/01e222c61f629e46acd30605f50c82bb63a7bd55/evaluation/qwen3_reranker_model.py#L54-L62
This feature is important because vLLM openai API server need backward compatibility for downstream apps. Ref: [Frontend][Model] Qwen3Rerank API Server backward compatibility vllm-project/vllm#20239 (comment)
which related to many issues requesting Qwen Reranker in serving vllm
This may be related to the evaluation of Qwen3 Reranker in MTEB, and may help Qwen3 Reranker run the evaluation in MTEB. https://github.com/embeddings-benchmark/mteb/blob/4ff1413316727ecec7ddaeb280fe963f06e7c3fb/mteb/evaluation/evaluators/RetrievalEvaluator.py#L310-L329

…ple to support dynamic prompt template and default configuration. Updated CrossEncoder class to accept prompt template parameters and apply them in prediction and ranking. Added test cases to verify the functionality of prompt template and the correctness of default configuration.

BetterAndBetterII · 2025-06-30T12:03:25Z

cc @tomaarsen
ready for review

tomaarsen · 2025-06-30T12:09:48Z

Hello!

This is really cool! I've been planning to add support for the "decoder-style" rerankers in a future version after v5.0. For context, Sentence Transformers v5.0 is scheduled for tomorrow, and in the interest of avoiding feature creep for v5.0, I'll have a look at this in more detail after tomorrow.
Much appreciated!

Tom Aarsen

BetterAndBetterII · 2025-06-30T12:11:57Z

Hello!  你好！

This is really cool! I've been planning to add support for the "decoder-style" rerankers in a future version after v5.0. For context, Sentence Transformers v5.0 is scheduled for tomorrow, and in the interest of avoiding feature creep for v5.0, I'll have a look at this in more detail after tomorrow.这真的很酷！我一直计划在 v5.0 之后的未来版本中添加对“解码器风格”重排序器的支持。作为背景，Sentence Transformers v5.0 计划在明天发布，为了避免 v5.0 的功能膨胀，我会在明天之后更详细地查看这个。 Much appreciated!  非常感谢！

Tom Aarsen  汤姆·阿尔森

Thank you very much! Looking forward to it

BetterAndBetterII · 2025-07-10T04:43:56Z

Is the version upgrade going smoothly? If you have time, can you evaluate my PR? @tomaarsen

tomaarsen · 2025-07-10T08:25:00Z

So far so good re. the release!
I started looking at this PR yesterday. I like the approach, but I'm considering seeing if we can upgrade the prompts functionality for all model archetypes in such a way that we allow the complex prompts that are required here.
And then beyond that, I think it would be valuable to support the CausalLM-style models out of the box, i.e. without converting them to Sequence Classification models. I'm doing some brainstorming for both.

Tom Aarsen

tomaarsen · 2025-07-10T12:49:54Z

I'm looking into potentially reusing apply_chat_template, but I'm unsure whether that would work nicely with truncation. Asking my colleagues about it now.

Tom Aarsen

BetterAndBetterII · 2025-07-10T15:47:33Z

I'm looking into potentially reusing apply_chat_template, but I'm unsure whether that would work nicely with truncation. Asking my colleagues about it now.

Tom Aarsen

It is indeed more reasonable to reuse apply_chat_template, so that you don't need to care about special tokens such as <|xxx|>. In this way, the incoming Instruct needs to be manually attached to the user message. I don't know if this is a good idea. But it is definitely a good thing for servers like vLLM that are compatible with OpenAI endpoints.

tomaarsen · 2025-07-25T08:27:53Z

P.s. the latest update is that there's no convenient way to handle the truncation, nor a commonly agreed upon truncation strategy. To be as model agnostic as possible, we'd have to support various different options, but it gets quite messy quite quickly.

tomaarsen · 2025-09-22T07:40:17Z

Hello @BetterAndBetterII,

Apologies for the delay. Locally, I've been working on this problem some more. Particularly, my goal is to support not just https://huggingface.co/tomaarsen/Qwen3-Reranker-0.6B-seq-cls, but also https://huggingface.co/Qwen/Qwen3-Reranker-0.6B itself. It will require a full refactor of the CrossEncoder class to be more like the SentenceTransformer/SparseEncoder classes, i.e. with modules that are executed sequentially. This allows me to create separate modules wrapping AutoModelForSequenceClassification (the current default) and AutoModelForCausalLM (like Qwen3), and use them as required.

That would then also include support for templating akin to what you proposed here.

Tom Aarsen

tomaarsen · 2025-12-19T15:54:58Z

a91ab2d from #3554 should introduce (chat) templating support to CrossEncoder models. The PR introduces a modality_config, which is a dictionary of supported modalities. One of the valid modalities is "message", denoting that tokenizer.apply_chat_template must be called with messages.

Even if passing text pairs and a prompt, the code can convert this to the required "message" format (also discussed in this thread: vllm-project/vllm#30550 (comment)), allowing the chat template to be used. Here are some example models that should work when that PR is checked out:

https://huggingface.co/cross-encoder-testing/Qwen3-Reranker-0.6B-STv6
https://huggingface.co/cross-encoder-testing/Qwen3-Reranker-0.6B-seq-cls-STv6 (equivalent as above, but converted from a text-generation model to a sequence-classification model. Gives the same results)
https://huggingface.co/cross-encoder-testing/mxbai-rerank-base-v2-STv6
https://huggingface.co/cross-encoder-testing/mxbai-rerank-large-v2-STv6

For example:

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("cross-encoder-testing/Qwen3-Reranker-0.6B-STv6")
# Get scores for pairs of texts
query = "Which planet is known as the Red Planet?"
documents = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]
pairs = [[query, doc] for doc in documents]
scores = model.predict(pairs)
print(scores)
# [-3.1092978   7.12039    -0.37875462  3.5416374 ]

This allows me to reuse the transformers chat_template functionality, which is a preference of mine over manually taking care of the templating. Using the chat_template that transformers uses also means that it'll be easier for projects like vLLM to load these models with the correct templating.

Apologies for the long delay, these changes are built on a big refactor in #3554, but I think it's a more convenient solution that's more futureproof. I wanted to give you a heads up on the direction that we'll head for templating in CrossEncoders, as I figured you might be curious!

Tom Aarsen

BetterAndBetterII marked this pull request as ready for review June 30, 2025 11:13

add comments

9d71a08

fix linter

d63155c

Merge branch 'master' into feat/qwen3-reranker

a798ae0

noooop mentioned this pull request Dec 12, 2025

[Frontend] Support using chat template as custom score template for reranking models vllm-project/vllm#30550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] CrossEncoder class accept prompt template parameters #3406

[Feat] CrossEncoder class accept prompt template parameters #3406

BetterAndBetterII commented Jun 30, 2025 •

edited

Loading

Uh oh!

BetterAndBetterII commented Jun 30, 2025

Uh oh!

tomaarsen commented Jun 30, 2025

Uh oh!

BetterAndBetterII commented Jun 30, 2025

Uh oh!

BetterAndBetterII commented Jul 10, 2025

Uh oh!

tomaarsen commented Jul 10, 2025 •

edited

Loading

Uh oh!

tomaarsen commented Jul 10, 2025

Uh oh!

BetterAndBetterII commented Jul 10, 2025 •

edited

Loading

Uh oh!

tomaarsen commented Jul 25, 2025

Uh oh!

tomaarsen commented Sep 22, 2025

Uh oh!

tomaarsen commented Dec 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feat] CrossEncoder class accept prompt template parameters #3406

Are you sure you want to change the base?

[Feat] CrossEncoder class accept prompt template parameters #3406

Conversation

BetterAndBetterII commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Best Practice

More customized usage

Test Cases:

Test Result:

Additional Example E2E Demo on Qwen3-Reranker-0.6B:

Uh oh!

BetterAndBetterII commented Jun 30, 2025

Uh oh!

tomaarsen commented Jun 30, 2025

Uh oh!

BetterAndBetterII commented Jun 30, 2025

Uh oh!

BetterAndBetterII commented Jul 10, 2025

Uh oh!

tomaarsen commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen commented Jul 10, 2025

Uh oh!

BetterAndBetterII commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen commented Jul 25, 2025

Uh oh!

tomaarsen commented Sep 22, 2025

Uh oh!

tomaarsen commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BetterAndBetterII commented Jun 30, 2025 •

edited

Loading

tomaarsen commented Jul 10, 2025 •

edited

Loading

BetterAndBetterII commented Jul 10, 2025 •

edited

Loading

tomaarsen commented Dec 19, 2025 •

edited

Loading