Skip to content

Recalculating ROUGE scores for See et al. (2017) test outputs.

Notifications You must be signed in to change notification settings

ecly/see_et_al_2017_rouge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ROUGE scoring See et al. (2017)

Repository to replicate the ROUGE scores from See et al. (2017).

We find that the reported scores correspond to those produced by the python re-implementation py-rouge, instead of those by produced by the official Rouge 155 Perl wrapper pyrouge.

The evaluate.py script accepts a 'hypothesis' folder and a 'reference' folder. The ROUGE scores computed with py-rouge and pyrouge respectively are then computed and printed to standard out.

The test_output folder, contains the test outputs from See et al. (2017), that can be downloaded from the README.md of the official repository.

Setup

pip install py-rouge pyrouge

pyrouge prerequisites

Ensure Perl XML library is installed:
On Arch Linux: sudo pacman -S perl-xml-xpath
On Ubuntu: sudo apt-get install libxml-parser-perl

ROUGE 155 install tips/debugging:
https://stackoverflow.com/questions/47045436/how-to-install-the-python-package-pyrouge-on-microsoft-windows

Evaluate

Note that pyrouge evaluates ~4x as slow as py-rouge, so some patience is required.

# Evaluate Pointer Generator
$> python evaluate.py test_output/pointer-gen test_output/reference

Python (py-rouge) scores:
         ROUGE-1 (F1): 36.43
         ROUGE-2 (F1): 15.66
         ROUGE-L (F1): 33.42

Perl (pyrouge) scores:
         ROUGE-1 (F1): 36.16
         ROUGE-2 (F1): 15.61
         ROUGE-L (F1): 33.21

# Evaluate Pointer Generator + Coverage
$> python evaluate.py test_output/pointer-gen-cov test_output/reference

Python (py-rouge) scores:
         ROUGE-1 (F1): 39.53
         ROUGE-2 (F1): 17.28
         ROUGE-L (F1): 36.38

Perl (pyrouge) scores:
         ROUGE-1 (F1): 39.24
         ROUGE-2 (F1): 17.22
         ROUGE-L (F1): 36.15

About

Recalculating ROUGE scores for See et al. (2017) test outputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published