Skip to content

ErgSemantics

EmilyBender edited this page May 9, 2016 · 61 revisions

Introduction

The purpose of these pages is to document the semantic representations produced by the English Resource Grammar (ERG; Flickinger 2000, 2011). The ERG is a broad-coverage, linguistically motivated precision grammar for English, associating richly detailed semantic representations with input sentences. These representations, dubbed English Resource Semantics or ERS, are in the formalism of Minimal Recursion Semantics (MRS; Copestake et al 2005). They include not only semantic roles, but also information about the scope of quantifiers and scopal operators including negation, as well as semantic representations of linguistically complex phenomena such as time and date expressions, conditionals, comparatives, and many others. ERS can be expressed in various ways, including a logic-based syntax using predicates and arguments, dependency graphs and dependency triples. In addition, the representations can be obtained either from existing large collections (> 1.25 million tokens) of manually curated annotations over texts from a wide variety of genres (the Redwoods Treebank, Oepen et al 2004) and DeepBank (Wall Street Journal corpus: Flickinger et al 2012) or by processing new text with the ERG and its associated parsing and parse selection algorithms.

As an example, below is the ‘Simple MRS’ view of the ERS for the sentence The garden dog tried not to bark.

FIXME: Some nicely formatted version of that here.

With high parsing accuracy with rich semantic representations, English Resource Semantics is a valuable source of information for many semantically-sensitive NLP tasks. ERS-based systems have achieved state-of-the-art results in various tasks, including the identification of speculative or negated event mentions in biomedical text (MacKinlay et al 2011), question generation (Yao et al 2012), detecting the scope of negation (Packard et al 2014), relating natural language to robot control language (Packard 2014), and recognizing textual entailment (PETE task; Lien & Kouylekov 2015). ERS representations have also been beneficial in semantic transfer-based MT (Oepen et al 2007, Bond et al 2011), ontology acquisition (Herbelot & Copestake 2006), extraction of glossary sentences (Reiplinger et al 2012), sentiment analysis (Kramer & Gordon 2014), and the ACL Anthology Searchbench (Schäfer et al 2011).

The ERG Semantic Documentation (ESD) initiative is an ongoing effort to provide ‘end-user’ documentation on the meaning representations that provide the interface to parsing and generation using the ERG. While ERG meaning representations abstract to a large degree from semantically irrelevant surface variation, it can at times be challenging to interpret (and appreciate) the nuances of particular semantic analyses. The ESD pages seek to provide an ever-growing ‘encyclopedia’ of semantic analyses available from the ERG.

Additional background information is provided by Flickinger, et al. (2014). These pages are jointly maintained by Emily M. Bender, Dan Flickinger, and Stephan Oepen, with input and feedback from, among others, Francis Bond, Ann Copestake, and Alex Lascarides.

Structure of the Documentation

The ESD pages are organized as a hyper-linked collection of smaller documents, each typically discussing a specific semantic phenomenon or particular set of higher-level considerations. ERG meaning representations take the form of underspecified logical forms, adopting the framework of Minimal Recursion Semantics (MRS; Copestake, et al. (2005)) An informal introduction to MRS, its use by the ERG, and basic terminology is provided by the ErgSemantics/Basics page. Beyond these more technical foundations, the ErgSemantics/Design pages discuss broader linguistic design decisions, for example assumptions regarding quantification, or the notion of eventualities assumed. For first-time consumers of ERG meaning representations, these pages aim to establish the ‘scaffolding’ for the core of the ESD pages, viz. a collection of pages that present individual semantic phenomena. The ‘table of content’ for this collection is available through the ErgSemantics/Inventory page.

Semantic Fingerprints

In capturing semantic phenomena on most ESD pages (and hopefully also in future work on automated regression testing) we invoke a notion of semantic fingerprints, i.e. characteristics of a specific MRS configuration that identifies a token phenomenon. We utilize a compact template language for MRS fingerprints (similar in form to the MRS LaTeX style; called ERS fingerprints when specialized for the semantic analyses of the English Resource Grammar) that makes the specification of labels and (characterization) links optional, and further allows wild-carding of predicate symbols and role labels (using ‘_’, i.e. just an underscore). For example, following is the semantic fingerprint or plain N–N compounding (as in garden dog):

  h:compound[ARG1 x1, ARG2 x2]
  h:[ARG0 x1]
  [ARG0 x2]

In other words, the phenomenon is characterized by the appearance of the two-place compound relation, linking together another two EPs in the configuration indicated by the shared label h (of the compound head and the two-place modifier relation) and the shared referential indices x1 and x2. These fingerprints will match the example given above, as well all other examples in the collection being searched analyzed as involving N-N compounding.

There is a search interface for ‘fingerprinting’ collections of ERG analyses, i.e. use patterns provided on ESD pages or own variants to retrieve instances of semantic phenomena from the comprehensive collections of manually annotated ‘gold-standard’ treebanks that accompany the ERG. In particular, these treebanks include a fresh re-annotation, dubbed DeepBank, of the venerable WSJ Corpus annotated in the Penn Treebank.

Fundamentals

Part of our goals in documenting the ERG semantics is to make explicit remaining differences in our degrees of commitment to (or confidence in) individual analyses. In many cases, current semantic analyses reflect a careful design process (possibly building on supporting background literature or revisions of earlier attempts); in other cases, there may be known minor deficiencies; and for yet another, hopefully minor group of phenomena, current analyses may be mere placeholders (‘tying things together’ somehow, without a deep commitment to the specifics of the analysis).

Discovery Procedure

We developed a discovery procedure which starts from grammar entities (phrase structure rules, lexical rules, and lexical types) in the current version of the ERG to enable a data-driven exploration of semantic phenomena which have received treatments in the ERG to date. The discovery procedure starts by identifying grammar entities which are likely to contribute to the composition of semantic representations that go beyond the basics. The details of what was considered ‘beyond the basics’ in the discovery of semantic phenomena are summarized on the ErgSemantics/Discovery page, together with some reflections on the effectiveness of the current procedure.

We organize this documentation in terms of what we consider semantic phenomena; the emerging inventory of phenomena is available as the ErgSemantics/Inventory, ordered lexicographically.

ESD Test Suite

One aspect of the documentation produced in this work is a test suite illustrating each identified phenomenon with one or more short, simple sentences, attempting to balance restricted vocabulary size with the clarity of the intended reading of each example. This test suite can be viewed as an extension of the MRS Test Suite.

How to Cite this Work

References

Bond, F., Oepen, S., Nichols, E., Flickinger, D., Velldal, E., & Haugereid, P. (2011). Deep open-sourc emachine translation. Machine Translation , 25 , 87-105. doi: 10.1007/s10590-011-9099-4

Copestake, A., Flickinger, D., Pollard, C., & Sag, I. A. (2005). Minimal Recursion Semantics. An introduction. Research on Language and Computation, 3(4), 281-332.

Flickinger, D. (2000). On building a more efficient grammar by exploiting types. Natural Language Engineering, 6 (1), 15-28.

Flickinger, D. (2011). Accuracy vs. robustness in grammar engineering. In E. M. Bender & J. E. Arnold (Eds.), Language from a cognitive perspective: Grammar, usage, and processing (pp. 31-50). Stanford: CSLI Publications.

Flickinger, D., Bender, E. M., & Oepen, S. (2014). Towards an encyclopedia of compositional semantics: Documenting the interface of the english resource grammar. In N. Calzolari et al. (Eds.), Proceedings of the ninth international conference on language resources and evaluation (LREC'14) (pp. 875-881). Reykjavik, Iceland: European Language Resources Association (ELRA).

Flickinger, D., Zhang, Y., & Kordoni, V. (2012). DeepBank. A dynamically annotated treebank of the Wall Street Journal. In (p. 85-96). Lisbon, Portugal: Edições Colibri.

Herbelot, A., & Copestake, A. (2006). Acquiring Ontological Relationships from Wikipedia Using RMRS. In Proceedings of the ISWC 2006 workshop on web content.

Kramer, J., & Gordon, C. (2014). Improvement of a naive bayes sentiment classifier using mrs-based features. In Proceedings of the third joint conference on lexical and computational semantics (*SEM 2014) (pp. 22-29). Dublin, Ireland: Association for Computational Linguistics and Dublin City University.

Lien, E., & Kouylekov, M. (2015). Semantic parsing for textual entailment. In Proceedings of the 14th International Conference on Parsing Technologies (p. 40-49). Bilbao, Spain.

MacKinlay, A., Martinez, D., & Baldwin, T. (2011). A parser-based approach to detecting modification of biomedical events. In Proceedings of the acm fifth international workshop on data and text mining in biomedical informatics (pp. 51-58). New York, NY, USA: ACM.

Oepen, S., Flickinger, D., Toutanova, K., & Manning, C. D. (2004). LinGO Redwoods. A rich and dynamic treebank for HPSG. Research on Language and Computation, 2(4), 575-596.

Oepen, S., Velldal, E., Lnning, J. T., Meurer, P., Rosn, V., & Flickinger, D. (2007). Towards hybrid quality-oriented machine translation: On linguistics and probabilities in MT. In Proceedings of 11th conference on theoretical and methodological issues in machine translation (p. 144-153). Skvde, Sweden.

Packard, W. (2014). UW-MRS: Leveraging a deep grammar for robotic spatial commands. In Proceedings of the 8th international workshop on semantic evaluation (semeval 2014) (pp. 812-816). Dublin, Ireland: Association for Computational Linguistics and Dublin City University.

Packard, W., Bender, E. M., Read, J., Oepen, S., & Dridan, R. (2014). Simple negation scope resolution through deep parsing: A semantic solution to a semantic problem. In Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 69-78). Baltimore, Maryland: Association for Computational Linguistics.

Reiplinger, M., Schäfer, U., & Wolska, M. (2012). Extracting glossary sentences from scholarly articles: A comparative evaluation of pattern bootstrapping and deep analysis. In Proceedings of the ACL-2012 special workshop on rediscovering 50 years of discoveries (pp. 55-65). Jeju Island, Korea.

Schäfer, U., Kiefer, B., Spurk, C., Steffen, J., & Wang, R. (2011). The ACL Anthology Searchbench. In Proceedings of the ACL-HLT 2011 system demonstrations (pp. 7-13). Portland, Oregon: Association for Computational Linguistics.

Yao, X., Bouma, G., & Zhang, Y. (2012). Semantics-based question generation and implementation. Dialogue & Discourse, 3(2), 11-42.

Clone this wiki locally