[feat] Add generic NanoEvaluator abstractions with NanoBEIR backward compatibility#3673
[feat] Add generic NanoEvaluator abstractions with NanoBEIR backward compatibility#3673hotchpotch wants to merge 16 commits intohuggingface:mainfrom
feat] Add generic NanoEvaluator abstractions with NanoBEIR backward compatibility#3673Conversation
| @@ -0,0 +1,232 @@ | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
These tests were added to cover the existing NanoBEIR behavior first, so that NanoBEIR could be refactored without breaking that behavior.
| instead of ``documents``. When using ``documents``, setting this to True will result in a more useful evaluation | ||
| signal, but setting it to False will result in a more realistic evaluation. Defaults to True. | ||
| batch_size (int): Batch size to compute sentence embeddings. Defaults to 64. | ||
| batch_size (int): Batch size to compute sentence embeddings. Defaults to 32. |
There was a problem hiding this comment.
The default batch size has long been 32 in the implementation, but the documentation still said 64, so I corrected the docstring to match the actual behavior.
|
After reviewing the changes again, I realized that although the new files are mostly copies of existing implementations, the diff shows them as pure additions. That makes it very hard to distinguish what was already in the existing implementation from what is actually new and worth focusing on in review, so the review cost feels quite high. I’m going to move this PR back to draft for now and try to restructure the changes so the diffs are easier to read and the review cost is lower. |
Hello!
Summary
This PR introduces a generic
NanoEvaluatorabstraction for sampled information-retrieval evaluation, while preserving backward compatibility for existing NanoBEIR evaluators.NanoBEIREvaluatoris very useful for large IR benchmarks because it evaluates sampled subsets with much lower cost. In the same spirit, this PR extends the approach beyond NanoBEIR so that other datasets with the samecorpus/queries/qrelsstructure can be evaluated with the same evaluator family.For example, this enables evaluation on datasets such as:
Details
NanoEvaluatoras a generic parent evaluator for dense retrieval.NanoBEIREvaluatorto subclassNanoEvaluatorwhile keeping its public API and output key conventions.CrossEncoderNanoEvaluatorandSparseNanoEvaluatoras generic parents for cross-encoder and sparse settings.CrossEncoderNanoBEIREvaluatorandSparseNanoBEIREvaluatoras NanoBEIR-specific wrappers with backward-compatible behavior.Backward Compatibility
NanoBEIREvaluatorusage remains valid.CrossEncoderNanoBEIREvaluatorandSparseNanoBEIREvaluatorusage remains valid.This is an initial implementation to make the idea concrete. I would greatly appreciate any feedback!