parsing evaluation metrics

We need some methods/scripts to evaluate parsing performance.  We probably want to do two things: a) replicate previous work that uses parseval so that we can easily report previous results (see table 3 in http://www.cc.gatech.edu/~jeisenst/papers/ji-acl-2014.pdf), and b) implement a more appropriate metric based on precision/recall of relations between spans, not just precision/recall of (labeled or unlabled) spans as in parseval.  See discussion from @sagae below.
- The metrics should report unlabeled and labeled performance
- The metrics should use the 18 coarse relations from Carlson et al.'s (2001) "Building a Discourse-tagged Corpus in the Framework of Rhetorical Structure Theory."
## Discussion from @sagae 

Looking at Fig 1 in http://www.isi.edu/~marcu/papers/sigdialbook2002.pdf, there are nine rhetorical relations, represented by the labeled directed arcs (`same-unit` is just a side effect of the annotation, and not a discourse relation).  We really should be looking at precision and recall of the relations represented in these labeled arcs.  So we would be looking for:

```
16 <- 17-26 : example
17-21 <- 22-26 : elaboration-additional
17-18 <- 19-21 : explanation-argumentative
22-25 <- 26 : consequence-s
17 <- 18 : attribution
19-20 <- 21 : attribution
19 <- 20 : elaboration-object-attribute-embedded
22 <- 23 : attribution-embedded
24 <- 25 : purpose
```

and precision and recall would be computed in the usual way, and successful identification of a relation requires the correct spans, the correct direction of the arrow, and the correct label.  The list doesn't include `22-23 <- 24-25 : same-unit`, but the parser does need to get this right to form the `22-25 span`, so it's taken into account
implicitly, which I think is the right way.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parsing evaluation metrics #2

Discussion from @sagae

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parsing evaluation metrics #2

Description

Discussion from @sagae

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions