Different results by retriever.encode_and_retrieve and retriever.retrieve

Hi, I encountered an issue while evaluating the performance of contriever-msmarco on the Arguana dataset using the official example.

When running the following code:
```
results = retriever.retrieve(corpus, queries)
results = retriever.encode_and_retrieve(corpus, queries)
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values, ignore_identical_ids=False)
```
I noticed that the results output might differ depending on which method (`retrieve `or `encode_and_retrieve`) is used. Specifically, `encode_and_retrieve `may include document IDs that are the same as the query IDs.

I always set `ignore_identical_ids=False`, and when using retrieve, I get a normal ndcg@10=44 by `retrieve`. However, when using `encode_and_retrieve`, the ndcg@10 is much lower at only 33.4. After comparing the results from both methods, I found that `encode_and_retrieve `includes query-document similarity with the query itself, which causes the issue.

I would like to know how I can fix this problem, as I intend to save the embeddings and use them.

Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different results by retriever.encode_and_retrieve and retriever.retrieve #206

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Different results by retriever.encode_and_retrieve and retriever.retrieve #206

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions