-
Notifications
You must be signed in to change notification settings - Fork 4
ErgSemantics
The purpose of these pages is to document the semantic representations produced by the English Resource Grammar (ERG; Flickinger 2000, 2011). The ERG is a broad-coverage, linguistically motivated precision grammar for English, associating richly detailed semantic representations with input sentences. These representations, dubbed English Resource Semantics or ERS, are in the formalism of Minimal Recursion Semantics (MRS; Copestake et al 2005). They include not only semantic roles, but also information about the scope of quantifiers and scopal operators including negation, as well as semantic representations of linguistically complex phenomena such as time and date expressions, conditionals, comparatives, and many others. ERS can be expressed in various ways, including a logic-based syntax using predicates and arguments, dependency graphs and dependency triples. In addition, the representations can be obtained either from existing manually produced annotations over texts from a variety of genres (the Redwoods Treebank, Oepen et al 2004) and DeepBank (Wall Street Journal corpus: Flickinger et al 2012) or by processing new text with the ERG and its associated parsing and parse selection algorithms.
With high parsing accuracy with rich semantic representations, English Resource Semantics is a valuable source of information for many semantically-sensitive NLP tasks. ERS-based systems have achieved state-of-the-art results in various tasks, including the identification of speculative or negated event mentions in biomedical text (MacKinlay et al 2011), question generation (Yao et al 2012), detecting the scope of negation (Packard et al 2014), relating natural language to robot control language (Packard 2014), and recognizing textual entailment (PETE task; Lien & Kouylekov 2015). ERS representations have also been beneficial in semantic transfer-based MT (Oepen et al 2007, Bond et al 2011), ontology acquisition (Herbelot 2006), extraction of glossary sentences (Reiplinger et al 2012), sentiment analysis (Kramer & Gordon 2014), and the ACL Anthology Searchbench (Schäfer et al 2011).
The ERG Semantic Documentation (ESD) initiative is an ongoing effort to provide ‘end-user’ documentation on the meaning representations that provide the interface to parsing and generation using the ERG. While ERG meaning representations abstract to a large degree from semantically irrelevant surface variation, it can at times be challenging to interpret (and appreciate) the nuances of particular semantic analyses. The EDS pages seek to provide an ever-growing ‘encyclopedia’ of semantic analyses available from the ERG.
Additional background information is provided by Flickinger, et al. (2014). These pages are jointly maintained by Emily M. Bender, Dan Flickinger, and Stephan Oepen, with input and feedback from, among others, Francis Bond, Ann Copestake, and Alex Lascarides.
The ESD pages are organized as a hyper-linked collection of smaller documents, each typically discussing a specific semantic phenomenon or particular set of higher-level considerations. ERG meaning representations take the form of underspecified logical forms, adopting the framework of Minimal Recursion Semantics (MRS; Copestake, et al. (2005) An informal introduction to MRS, its use by the ERG, and basic terminology is provided by the ErgSemantics/Basics page. Beyond these more technical foundations, the ErgSemantics/Design pages discuss broader linguistic design decisions, for example assumptions regarding quantification, or the notion of eventualities assumed. For first-time consumers of ERG meaning representations, these pages aim to establish the ‘scaffolding’ for the core of the ESD pages, viz. a collection of pages that present individual semantic phenomena. The ‘table of content’ for this collection is available through the ErgSemantics/Inventory page.
In capturing semantic phenomena on most ESD pages (and hopefully also in future work on automated regression testing) we invoke a notion of semantic fingerprints, i.e. characteristics of a specific MRS configuration that identifies a token phenomenon. We utilize a compact template language for MRS fingerprints (similar in form to the MRS LaTeX style; called ERS fingerprints when specialized for the semantic analyses of the English Resource Grammar) that makes the specification of labels and (characterization) links optional, and further allows wild-carding of predicate symbols and role labels (using ‘_’, i.e. just an underscore). For example, following is the semantic fingerprint or plain N–N compounding (as in garden dog):
h:compound[ARG1 x1, ARG2 x2]
h:[ARG0 x1]
[ARG0 x2]
In other words, the phenomenon is characterized by the appearance of the two-place compound relation, linking together another two EPs in the configuration indicated by the shared label h (of the compound head and the two-place modifier relation) and the shared referential indices x1 and x2. We do not include the covert quantifier required when the modifier is a non-quantified nominal, or the =q handle constraint holding between the udef_q and the EP introducing x2 (corresponding to garden in our example), because this part of the semantic analysis of the compound construction follows from the analyses of separate phenomena (though ones that are typically co-present with this type of compounding), i.e. general ERG assumptions about the representations of common nouns and quantifiers.
There is a search interface for ‘fingerprinting’ collections of ERG analyses, i.e. use patterns provided on ESD pages or own variants to retrieve instances of semantic phenomena from the comprehensive collections of manually annotated ‘gold-standard’ treebanks that accompany the ERG. In particular, these treebanks include a fresh re-annotation, dubbed DeepBank, of the venerable WSJ Corpus annotated in the Penn Treebank.
Part of our goals in documenting the ERG semantics is to make explicit remaining differences in our degrees of commitment to (or confidence in) individual analyses. In many cases, current semantic analyses reflect a careful design process (possibly building on supporting background literature or revisions of earlier attempts); in other cases, there may be known minor deficiencies; and for yet another, hopefully minor group of phenomena, current analyses may be mere placeholders (‘tying things together’ somehow, without a deep commitment to the specifics of the analysis).
We developed a discovery procedure which starts from grammar entities (phrase structure rules, lexical rules, and lexical types) in the current version of the ERG to enable a data-driven exploration of semantic phenomena which have received treatments in the ERG to date. The discovery procedure starts by identifying grammar entities which are likely to contribute to the composition of semantic representations that go beyond the basics. The details of what was considered ‘beyond the basics’ in the discovery of semantic phenomena are summarized on the ErgSemantics/Discovery page, together with some reflections on the effectiveness of the current procedure.
We organize this documentation in terms of what we consider semantic phenomena; the emerging inventory of phenomena is available as the ErgSemantics/Inventory, ordered lexicographically.
One aspect of the documentation produced in this work is a test suite illustrating each identified phenomenon with one or more short, simple sentences, attempting to balance restricted vocabulary size with the clarity of the intended reading of each example. This test suite can be viewed as an extension of the MRS Test Suite.
Home | Forum | Discussions | Events