One-stage rerank results are suspiciously poor

### Describe the bug

I just annotate stage 1 code to get rerank results by CrossEncoder directly, but the rerank results are very poor, code is like this:

```
from mteb import MTEB
import mteb
from sentence_transformers import CrossEncoder, SentenceTransformer

dual_encoder = SentenceTransformer("all-MiniLM-L6-v2")

cross_encoder = CrossEncoder("cross-encoder/ms-marco-TinyBERT-L-2-v2")

tasks = mteb.get_tasks(tasks=["NFCorpus"], languages=["eng"])

subset = "default" # subset name used in the NFCorpus dataset
eval_splits = ["test"]

evaluation = MTEB(tasks=tasks)
# evaluation.run(#no stage 1
#    dual_encoder,
#    eval_splits=eval_splits,
#    save_predictions=True,
#    output_folder="results/stage1",
# )
evaluation.run(
    cross_encoder,
    eval_splits=eval_splits,
    # top_k=500,
    save_predictions=True,
    output_folder="results/stage2",
    # previous_results=f"results/stage1/NFCorpus_{subset}_predictions.json", # get results directly by CrossEncoder
)
```

here is some of the stage 1 results when I made this change before:
```
        "recall_at_1": 0.04323,
        "recall_at_3": 0.09053,
        "recall_at_5": 0.11959,
        "recall_at_10": 0.15499,
        "recall_at_20": 0.18891,
        "recall_at_100": 0.31151,
        "recall_at_1000": 0.63211,
        "precision_at_1": 0.41486,
        "precision_at_3": 0.34881,
        "precision_at_5": 0.30279,
        "precision_at_10": 0.24334,
        "precision_at_20": 0.17817,
        "precision_at_100": 0.07981,
        "precision_at_1000": 0.02096,
```
and this is the result directly by CrossEncoder:
```
        "recall_at_1": 0.00784,
        "recall_at_3": 0.01523,
        "recall_at_5": 0.01939,
        "recall_at_10": 0.02783,
        "recall_at_20": 0.03382,
        "recall_at_100": 0.05049,
        "recall_at_1000": 0.12993,
        "precision_at_1": 0.13313,
        "precision_at_3": 0.11765,
        "precision_at_5": 0.10774,
        "precision_at_10": 0.08545,
        "precision_at_20": 0.0596,
        "precision_at_100": 0.02201,
        "precision_at_1000": 0.00699,
```
Running a single CrossEncoder , the results should be much better than a single SentenceTransformer,but it is not, so what is the problem?

Here is my environment:
sentence-transformers    5.0.0
mteb                     1.38.35


### To reproduce

I just use the official rerank code and annotate the stage 1(as the discription) to reproduce，here is the official rerank code:
```
from mteb import MTEB
import mteb
from sentence_transformers import CrossEncoder, SentenceTransformer

cross_encoder = CrossEncoder("cross-encoder/ms-marco-TinyBERT-L-2-v2")
dual_encoder = SentenceTransformer("all-MiniLM-L6-v2")

tasks = mteb.get_tasks(tasks=["NFCorpus"], languages=["eng"])

subset = "default" # subset name used in the NFCorpus dataset
eval_splits = ["test"]

evaluation = MTEB(tasks=tasks)
evaluation.run(
    dual_encoder,
    eval_splits=eval_splits,
    save_predictions=True,
    output_folder="results/stage1",
)
evaluation.run(
    cross_encoder,
    eval_splits=eval_splits,
    top_k=5,
    save_predictions=True,
    output_folder="results/stage2",
    previous_results=f"results/stage1/NFCorpus_{subset}_predictions.json",
)
```

### Additional information

_No response_

### Are you interested to contribute a fix for this bug?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

One-stage rerank results are suspiciously poor #2933

Describe the bug

To reproduce

Additional information

Are you interested to contribute a fix for this bug?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

One-stage rerank results are suspiciously poor #2933

Description

Describe the bug

To reproduce

Additional information

Are you interested to contribute a fix for this bug?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions