Can not reproduce Qwen3-Embedding results

### Describe the bug

I am attempting to evaluate the new Qwen3-Embedding models on ruMTEB but have been unable to reproduce the published scores.

Used MTEB evaluation code and one from [Qwen team](https://github.com/QwenLM/Qwen3-Embedding/tree/main/evaluation)

As scores were commited on 4th of june and actual version tied to the scores was released on 6th of june (b22da495047858cce924d27d76261e96be6febc0 huggingface commit), I added previous version too for comparison (99cabfa1346cbf4ac8b0e73079bb2e286cff3a1f)

Model addition PR: https://github.com/embeddings-benchmark/mteb/pull/2769
Scores addition PR: https://github.com/embeddings-benchmark/results/pull/214

The scores I obtained for Qwen3-Embedding-0.6B:
|dataset|original main score|mteb eval (b22da4)|mteb eval (99cabf)|qwen eval (b22da4)|qwen eval (99cabf)|
|-|-|-|-|-|-|
|TERRa|0.606803|0.561885|0.558446|0.601651|0.565181|
|AILAStatutes|0.79018|0.72796|0.41639|0.79309|0.5805|
|STS22 (eng)|0.711317|0.708369|0.659296|0.708842|0.291827|

Tried changing versions of transformers (4.53.4/4.52.4/4.52.1/4.51.3), sentence-transformers (4.1.0/5.0.0), MTEB (1.38.9/1.38.30/1.38.34) - no main_score difference in TERRa task by MTEB evaluation. 
As in MTEB, model uses all context (32k) and in Qwen eval only 8k context it also gives ~0.01 difference in scores

Also tried FA2 with `model_kwargs={"attn_implementation": "flash_attention_2", "torch_dtype": torch.float16}`, got no difference.


### To reproduce


Code for reproduction:

MTEB:

```python

import mteb
from mteb.models.qwen3_models import *

Qwen3_Embedding_0B6 = ModelMeta(
    loader=partial(
        q3e_instruct_loader,
        model_name_or_path=os.environ.get("Q3E_0B6_PATH", "Qwen/Qwen3-Embedding-0.6B"),
        # revision='99cabfa1346cbf4ac8b0e73079bb2e286cff3a1f',
        revision="b22da495047858cce924d27d76261e96be6febc0",
    ),
    name="Qwen/Qwen3-Embedding-0.6B",
    languages=multilingual_langs,
    open_weights=True,
    # revision="99cabfa1346cbf4ac8b0e73079bb2e286cff3a1f",
    revision="b22da495047858cce924d27d76261e96be6febc0",
    release_date="2025-06-05",
    n_parameters=595776512,
    memory_usage_mb=2272,
    embed_dim=1024,
    max_tokens=32768, # or 8k - 8192
    license="apache-2.0",
    reference="https://huggingface.co/Qwen/Qwen3-Embedding-0.6B",
    similarity_fn_name="cosine",
    framework=["Sentence Transformers", "PyTorch"],
    use_instructions=True,
    public_training_code=None,
    public_training_data=None,
    training_datasets=training_data,
)


tasks = mteb.get_tasks(tasks=[
        'TERRa',
        'AILAStatutes',
        'STS22'
    ],
    # languages = ['eng-Latn'],
    # exclusive_language_filter=True,
)
model = Qwen3_Embedding_0B6.load_model()

evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, verbosity=2, encode_kwargs={'batch_size': 1})
```

Qwen:

change revision/tasks

```bash
python run_mteb.py   --model Qwen/Qwen3-Embedding-0.6B   --model_name Qwen/Qwen3-Embedding-0.6B   --precision fp16   --model_kwargs "{\"max_length\": 8192, \"attn_type\": \"causal\", \"pooler_type\": \"last\", \"do_norm\": true, \"use_instruction\": true, \"instruction_template\": \"Instruct: {}\nQuery:\", \"instruction_dict_path\": \"task_prompts.json\", \"attn_implementation\":\"flash_attention_2\", \"revision\":\"99cabfa1346cbf4ac8b0e73079bb2e286cff3a1f\"}"   --run_kwargs "{\"save_predictions\": \"true\"}"   --batch_size 1  --tasks "STS22"
```

### Additional information

_No response_

### Are you interested to contribute a fix for this bug?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can not reproduce Qwen3-Embedding results #2907

Describe the bug

To reproduce

Additional information

Are you interested to contribute a fix for this bug?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dataset	original main score	mteb eval (b22da4)	mteb eval (99cabf)	qwen eval (b22da4)	qwen eval (99cabf)
TERRa	0.606803	0.561885	0.558446	0.601651	0.565181
AILAStatutes	0.79018	0.72796	0.41639	0.79309	0.5805
STS22 (eng)	0.711317	0.708369	0.659296	0.708842	0.291827

Can not reproduce Qwen3-Embedding results #2907

Description

Describe the bug

To reproduce

Additional information

Are you interested to contribute a fix for this bug?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions