Skip to content

GrammarConfigurationRfc

MichaelGoodman edited this page Jul 14, 2020 · 8 revisions

There are a variety of processors that load TDL grammars, such as agree, ACE, the LKB (including LkbFos), PET, and to a limited degree PyDelphin. These all need some way to specify which files to load and how to load them, and most implement their own configuration format. There are some problems with this arrangement: duplication of information, subtle differences among implementations, increased mental burden on grammar developers, etc. This RFC, therefore, proposes a unified configuration file format that works with all processors, possibly with some processor-specific section for unique features.

Survey

Part of loading a grammar is determining whether to load some TDL as types or instances (including lexicons, rules, etc.). TDL has the ability to mark certain blocks of code and defining types or instances, but the LKB is a notable deviant in not using this feature. Other things that need to be loaded include morphological preprocessors, parse selection models, and so on. Configuration also sets the values of variables which can affect processing and output.

LKB

The LKB uses a script (generally at lkb/script) that directly calls its lisp functions for loading files:

  • load-irregular-spellings

  • read-tdl-type-files-aux

  • read-cached-leaf-types-if-available

  • load-lexdb-from-script

  • read-cached-sublex-if-available

  • read-cached-lex-if-available

  • read-tdl-grammar-file-aux

  • read-tdl-psort-file-aux

  • read-tdl-lex-rule-file-aux

  • read-tdl-parse-node-file-aux

  • read-tdl-start-file-aux

  • tsdb::read-model

  • mt:read-semi

  • mt:read-vpm

  • mt:read-transfer-rules

ACE and PET

ACE and PET use declarative config files (e.g., ace/config.tdl) in a pseudo-TDL syntax (with slight differences between ACE and PET). They relies on TDL's environments to read TDL as types vs instances.

For example, from the ERG's ACE config:

grammar-top               := "../english.tdl".
variable-property-mapping := "../semi.vpm".
maxent-model              := "../redwoods.mem".
preprocessor              := "../rpp/tokenizer.rpp".
preprocessor-modules      := ../rpp/xml.rpp ../rpp/ascii.rpp ../rpp/lgt.rpp ../rpp/quotes.rpp ../rpp/wiki.rpp ../rpp/gml.rpp ../rpp/html.rpp.
[...]
orth-path                 := ORTH.
semantics-path            := SYNSEM LOCAL CONT.
lex-rels-path             := SYNSEM LOCAL CONT RELS.
[...]
semarg-type               := semarg.
list-type                 := *list*.
[...]
chart-dependencies :=
  "SYNSEM LKEYS --+COMPKEY" "SYNSEM LOCAL CAT HEAD MINORS MIN"
  "SYNSEM LKEYS --+OCOMPKEY" "SYNSEM LOCAL CAT HEAD MINORS MIN"
  "SYNSEM LKEYS --+ARGIND" "SYNSEM LOCAL CONT HOOK INDEX"
.
[...]
;; part of speech tagging
english-pos-tagger := enabled.
[...]
:begin :type.
:include "../mtr".
:end :type.

Proposal

The TDL-like declarative config seems the most portable, as we can't expect non-Lisp-based processors to understand the Lisp calls of the LKB's script files. However it would be better if the config files were more compliant with the TDL syntax. For this, we suggest using a new config environment in TDL:

:begin :config.
...
:end :config.

Questions

Clone this wiki locally