Skip to content

MatrixTDBProcedures

anonymous edited this page Sep 17, 2009 · 13 revisions

MatrixTDB is the regression test facility for the Grammar Matrix and the Matrix customization system. It allows us to create gold standard tsdb++ profiles on demand for language types defined in choices files.

High Level Overview

There are three main things you might want to do with MatrixTDB: put data in or get data out. Add new strings, add a language type, extract a profile for a language type. These three high-level tasks break down into smaller sub-tasks. The breakdown into sub-tasks is displayed here, while the Detailed Processes section of this page breaks each of those down into smaller tasks.

Adding New Strings Breakdown

  • Create a source profile
  • Import the source profile
  • Add permutes
  • Run specific filters

Adding A Language Type Breakdown

  • (Actually this just breaks down to one sub-task: adding a language type

Extracting a Profile Breakdown

  • Import the language type
  • Generate a profile

Detailed Processes

This section describes step-by-step instructions on how to perform various tasks and sub-tasks with MatrixTDB.

Definitions

Old Stuff

Matrix developers should update MatrixTDB as follows:

  • Determine the harvester strings you'll need to illustrate your library.

  • Determine the semantically neutral variants your library allows for each harvester string, at the level of bags of words. For example, the basic lexical library allows for case-marking adpositions. So p-nom can be added to any string with an overt subject to get a semantically equivalent string, provided p-nom is in the right place.

  • update customize/sql_profiles/stringmods.py to reflect the modifications.

  • Create a harvester grammar to process you harvester strings with. Save the choices file from that grammar.

  • Create a file listing the harvester strings and their mrs_tags (see customize/sql_profiles/harv_str/harv_mrs_1 for an example).

  • Create a file with just the harvester strings.

  • Start the LKB and tsdb++

  • Load the harvester grammar in to the LKB.

  • In tsdb++ to File > Import > Test items to import the harvester strings.

  • Make sure tsdb++ is set to write the mrs field.

  • Process the items you imported with your grammar.

  • The resulting profile will be your source_profile.

  • Next, run customize/sql_profiles/import_from_itsdb.py with your source_profile, choices file and harv_mrs file as arguments.

  • Get the resulting osp_id

  • Then run customize/sql_profiles/add_permutes.py and give it the osp_id

  • Update the universal and specific filters in u_filters.py and s_filters.py

  • Run run_u_filters.py

  • Run the SQL query that separates the universally ungrammatical from universally grammatical results.

  • Run run_specific_filters.py

At this point, MatrixTDB is up to date. We can also use import_from_tsdb.py to update the mrss we want to have corresponding to particular mrs_tags.

To export a profile corresopnding to a given choices file:

  • _FIX_ME_ instructions here.

TODO:

  • Work out how to run filters recursively for coordination et al
  • Update filters for coordination
  • Update filters for inflection version of question particles
Clone this wiki locally