Skip to content

parsing evaluation metrics #2

@mheilman

Description

@mheilman

We need some methods/scripts to evaluate parsing performance. We probably want to do two things: a) replicate previous work that uses parseval so that we can easily report previous results (see table 3 in http://www.cc.gatech.edu/~jeisenst/papers/ji-acl-2014.pdf), and b) implement a more appropriate metric based on precision/recall of relations between spans, not just precision/recall of (labeled or unlabled) spans as in parseval. See discussion from @sagae below.

  • The metrics should report unlabeled and labeled performance
  • The metrics should use the 18 coarse relations from Carlson et al.'s (2001) "Building a Discourse-tagged Corpus in the Framework of Rhetorical Structure Theory."

Discussion from @sagae

Looking at Fig 1 in http://www.isi.edu/~marcu/papers/sigdialbook2002.pdf, there are nine rhetorical relations, represented by the labeled directed arcs (same-unit is just a side effect of the annotation, and not a discourse relation). We really should be looking at precision and recall of the relations represented in these labeled arcs. So we would be looking for:

16 <- 17-26 : example
17-21 <- 22-26 : elaboration-additional
17-18 <- 19-21 : explanation-argumentative
22-25 <- 26 : consequence-s
17 <- 18 : attribution
19-20 <- 21 : attribution
19 <- 20 : elaboration-object-attribute-embedded
22 <- 23 : attribution-embedded
24 <- 25 : purpose

and precision and recall would be computed in the usual way, and successful identification of a relation requires the correct spans, the correct direction of the arrow, and the correct label. The list doesn't include 22-23 <- 24-25 : same-unit, but the parser does need to get this right to form the 22-25 span, so it's taken into account
implicitly, which I think is the right way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions