[`feat`] Add generic NanoEvaluator abstractions with NanoBEIR backward compatibility by hotchpotch · Pull Request #3673 · huggingface/sentence-transformers

hotchpotch · 2026-02-24T12:38:45Z

Hello!

Summary

This PR introduces a generic NanoEvaluator abstraction for sampled information-retrieval evaluation, while preserving backward compatibility for existing NanoBEIR evaluators.

NanoBEIREvaluator is very useful for large IR benchmarks because it evaluates sampled subsets with much lower cost. In the same spirit, this PR extends the approach beyond NanoBEIR so that other datasets with the same corpus/queries/qrels structure can be evaluated with the same evaluator family.

For example, this enables evaluation on datasets such as:

Details

Add NanoEvaluator as a generic parent evaluator for dense retrieval.
Refactor NanoBEIREvaluator to subclass NanoEvaluator while keeping its public API and output key conventions.
Add CrossEncoderNanoEvaluator and SparseNanoEvaluator as generic parents for cross-encoder and sparse settings.
Keep CrossEncoderNanoBEIREvaluator and SparseNanoBEIREvaluator as NanoBEIR-specific wrappers with backward-compatible behavior.
Add/expand examples for dense, sparse, and cross-encoder usage.

Backward Compatibility

Existing NanoBEIREvaluator usage remains valid.
Existing CrossEncoderNanoBEIREvaluator and SparseNanoBEIREvaluator usage remains valid.
Metric naming and expected primary metrics are preserved for NanoBEIR evaluators.
I also ran the sample code in NanoBEIREvaluator.py and confirmed that the resulting values remained unchanged from previous behavior.

This is an initial implementation to make the idea concrete. I would greatly appreciate any feedback!

… classes

…iant

hotchpotch · 2026-02-24T12:43:52Z

tests/evaluation/test_nano_evaluator.py

@@ -0,0 +1,232 @@
+from __future__ import annotations


These tests were added to cover the existing NanoBEIR behavior first, so that NanoBEIR could be refactored without breaking that behavior.

hotchpotch · 2026-02-24T12:55:38Z

sentence_transformers/cross_encoder/evaluation/nano_beir.py

            instead of ``documents``. When using ``documents``, setting this to True will result in a more useful evaluation
            signal, but setting it to False will result in a more realistic evaluation. Defaults to True.
-        batch_size (int): Batch size to compute sentence embeddings. Defaults to 64.
+        batch_size (int): Batch size to compute sentence embeddings. Defaults to 32.


The default batch size has long been 32 in the implementation, but the documentation still said 64, so I corrected the docstring to match the actual behavior.

hotchpotch · 2026-02-26T23:21:05Z

After reviewing the changes again, I realized that although the new files are mostly copies of existing implementations, the diff shows them as pure additions.

That makes it very hard to distinguish what was already in the existing implementation from what is actually new and worth focusing on in review, so the review cost feels quite high.

I’m going to move this PR back to draft for now and try to restructure the changes so the diffs are easier to read and the review cost is lower.

hotchpotch added 8 commits February 24, 2026 20:11

Add generic NanoEvaluator abstractions with NanoBEIR compatibility

c2e08f3

docs: add local AGENTS guidance for NanoEvaluator PR workflow

beb18fe

Add NanoEvaluator example scripts across dense/sparse/cross encoders

caff296

Restore NanoBEIR docs and move generic behavior docs to NanoEvaluator…

b6a40bc

… classes

Enhance Nano evaluator examples with per-dataset metric breakdown

4226e95

Refine metric-key comments and fix cross Nano config serialization

3da882d

Add NanoBEIR primary-metric unit tests and document sparse init invar…

776c498

…iant

Add CSV output unit test for NanoBEIREvaluator

8e07fc0

hotchpotch commented Feb 24, 2026

View reviewed changes

Use neutral example dataset IDs in Nano evaluator unit tests

59750f6

hotchpotch marked this pull request as ready for review February 24, 2026 12:58

hotchpotch marked this pull request as draft February 26, 2026 23:18

hotchpotch added 7 commits March 6, 2026 17:08

Refactor nano evaluators to minimize main diff

3e55ae6

Refactor nano evaluator test fixtures

8db2c83

Allow direct split names in nano evaluators

97a5e26

Move generic nano hooks out of shared helper

04a8408

Trim cross nano evaluator surface area

69f294a

Fix generic nano evaluator validation

0183225

Reduce redundant nano evaluator hooks

0a0d555

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`feat`] Add generic NanoEvaluator abstractions with NanoBEIR backward compatibility#3673

[`feat`] Add generic NanoEvaluator abstractions with NanoBEIR backward compatibility#3673
hotchpotch wants to merge 16 commits intohuggingface:mainfrom
hotchpotch:nano-eval

hotchpotch commented Feb 24, 2026 •

edited

Loading

Uh oh!

hotchpotch Feb 24, 2026

Uh oh!

hotchpotch Feb 24, 2026

Uh oh!

hotchpotch commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hotchpotch commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Backward Compatibility

Uh oh!

hotchpotch Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

hotchpotch Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

hotchpotch commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hotchpotch commented Feb 24, 2026 •

edited

Loading