-
Notifications
You must be signed in to change notification settings - Fork 4
WeSearch_LexicalFiltering
Working with a lattice of lexical hypotheses and an (über)tagger, we seek to develop a filtering function that discards unlikely hypotheses. The formalisation of the lexical filtering process may be found [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/formalisation.pdf here].
One such filter function maps PTB tags output from the TNT tagger onto LE Types. Mappings may be derived intuitively from inspection of a [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/tnt.le.confusion.pdf confusion matrix] detailing the choices of TNT with respect to LE types.
An alternative approach is to programmatically find mappings based on the preferred outcomes of lexical filtering, (i.e. gains in parser efficiency versus losses in parser accuracy and coverage). These outcomes may be approximated by examining the relations between TNT precision, TNT recall and the lexical ambiguity of LE types.
Frequency of LE types, cross-validated across subsets of the WeScience corpus:
type | frequency | std. dev. |
n | 3830 | 450 |
v | 1712 | 237 |
p | 1401 | 132 |
d | 1119 | 124 |
aj | 1073 | 126 |
av | 411 | 58 |
c | 381 | 43 |
cm | 129 | 24 |
pp | 61 | 11 |
pt | 15 | 9 |
x | 1 | 1 |
ROC plots of the TNT performance on the most frequent LE types
[http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/roc.png]
Plots of the true-positive, false-negative and filter rates for each handled LE type, where the filter rate is calculated according to le type (l) and probability threshold (t):
[http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/filter-rate.png]
N [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/n.png]
V [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/v.png]
P [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/p.png]
D [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/d.png]
AJ [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/aj.png]
AV [http://dl.dropbox.com/u/680530/WeSearch/Lexical%20Filtering/av.png]
- Rebecca Driden (2009), [http://www.dridan.com/research/papers/dridan-phdthesis.pdf Using Lexical Statistics to Improve HPSG Parsing], PhD Thesis, Saarland University
Home | Forum | Discussions | Events