Skip to content

Fix corpus reference orientation for chrF/chrF++/TER metrics#1176

Open
dyurchenko98 wants to merge 1 commit intohuggingface:mainfrom
dyurchenko98:fix/chrf_fix
Open

Fix corpus reference orientation for chrF/chrF++/TER metrics#1176
dyurchenko98 wants to merge 1 commit intohuggingface:mainfrom
dyurchenko98:fix/chrf_fix

Conversation

@dyurchenko98
Copy link

Summary

Fixes incorrect reference layout for corpus chrf, chrf++, and ter in CorpusLevelTranslationMetric.compute_corpus.

SacreBLEU expects references shaped as [ref_id][sample_id], but lighteval passed [sample_id][ref_id].
This can drop hypotheses from scoring and compare remaining hypotheses against mis-grouped (pooled) references.

Example (single-ref per sample):

  • preds: [p1, p2]
  • wrong refs passed: [[r1], [r2]]
  • interpreted as 2 reference streams of length 1, so only p1 is scored (against refs [r1, r2]), while p2 is ignored.

Changes

  • Transpose non-BLEU references with zip_longest(..., fillvalue=None) before calling sacrebleu.
  • Keep BLEU path unchanged.
  • Add regression tests for:
    • all hypotheses being scored,
    • variable number of references per sample.
  • Update affected corpus metric fixture expected values (chrf, chrf_plus, ter) to match correct behavior.

Fixes #1112.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] chrF++/chrF/TER metrics receive references in wrong format, causing incorrect corpus-level scoring

1 participant