Fix corpus reference orientation for chrF/chrF++/TER metrics by dyurchenko98 · Pull Request #1176 · huggingface/lighteval

dyurchenko98 · 2026-03-05T14:40:52Z

Summary

Fixes incorrect reference layout for corpus chrf, chrf++, and ter in CorpusLevelTranslationMetric.compute_corpus.

SacreBLEU expects references shaped as [ref_id][sample_id], but lighteval passed [sample_id][ref_id].
This can drop hypotheses from scoring and compare remaining hypotheses against mis-grouped (pooled) references.

Example (single-ref per sample):

preds: [p1, p2]
wrong refs passed: [[r1], [r2]]
interpreted as 2 reference streams of length 1, so only p1 is scored (against refs [r1, r2]), while p2 is ignored.

Changes

Transpose non-BLEU references with zip_longest(..., fillvalue=None) before calling sacrebleu.
Keep BLEU path unchanged.
Add regression tests for:
- all hypotheses being scored,
- variable number of references per sample.
Update affected corpus metric fixture expected values (chrf, chrf_plus, ter) to match correct behavior.

Fixes #1112.

fix: fix chrf/chrf++/ter metrics corpus-level computation

ab06072

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix corpus reference orientation for chrF/chrF++/TER metrics#1176

Fix corpus reference orientation for chrF/chrF++/TER metrics#1176
dyurchenko98 wants to merge 1 commit intohuggingface:mainfrom
dyurchenko98:fix/chrf_fix

dyurchenko98 commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dyurchenko98 commented Mar 5, 2026

Summary

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant