Skip to content

MatrixDevBlueprints

MichaelGoodman edited this page May 16, 2011 · 4 revisions

This page is a place to document ideas and proposals for improvements/changes to the Grammar Matrix codebase before they are implemented. If they get implemented, the blueprints should be removed or deprecated somewhere.

Choices File Data Structures

Currently we have a system where choices are filled into a data structure of nested dictionaries and lists. For example, with the following choices:

noun1_name=common-noun
noun1_stem1_orth=dog
noun1_stem3_orth=cat

the following structure would be created:

   1 {'noun': [{'name': 'common-noun',
   2            'stem': [{'orth': 'dog'},
   3                     None,
   4                     {'orth': 'cat'}]}]}

These data structures are very close to the JSON format, and a goal might be to convert the Questionnaire to use JSON instead of its own format.

Interaction with these data structures is managed so users can refer to objects by their choice key (e.g. "noun1_stem" is the list of stems for noun1), or as Python objects. For instance:

   1 >>> choices['noun1_name']
   2 common-noun
   3 >>> choices['noun1']
   4 {'name': 'common-noun',
   5          'stem': [{'orth': 'dog'},
   6                   None,
   7                   {'orth': 'cat'}]}
   8 >>> for noun in choices['noun']:
   9 ...   stem1 = noun['stem1_orth']
  10 ...   for stem in noun['stem']:
  11 ...     if stem1 == stem['orth']:
  12 ...       print stem1, '==', stem['orth']
  13 ...     else:
  14 ...       print stem1, '!=', stem['orth']
  15 dog == dog
  16 dog != cat

Note how sub structures can still refer to nested objects by choices key (relative to the current object), or by iterating through them, etc. Also note that empty items (None) are skipped.

The problem with this approach is that it is inefficient to get substructures, because each time the key must be split into its components (noun, 1, stem, 1, orth), and tests are run to look for empty list items, etc. This proposal is for an alternative backend data structure that allows the same kind of interaction.

Proposal 1: Simulated substructures

One possibility is to use a single dictionary that holds all full keys (similar to the original choices file, and in some ways similar to the original implementation, but without the headache), but use objects that simulate substructures for the complex interactions. Some (incomplete) code might clear things up:

   1 class ChoiceStruct(dict):
   2   def __init__(self, primary_key, choices):
   3     self.primary_key = primary_key
   4     self.choices = choices
   5 
   6   def __getitem__(self, key):
   7     try:
   8       return self.choices[self.primary_key + key]
   9     except KeyError:
  10       return ChoiceStruct(key, self.choices)

If a user gives a full key, like 'noun1_name', it will return the value from the choices dictionary. If it gives a partial key, such as 'noun1', it will return a new ChoiceStruct with the primary_key set to 'noun1', so subsequent retrievals from the new structure would be relative to that primary_key. There are a few concerns:

  • How to allow iteration over the numbered items (noun1_stem1, noun1_stem3, etc)
  • How to calculate length of lists of numbered items
  • How to deal with bogus keys (e.g. 'noun1_this_is_not_a_real_key')
  • How to deal with 'incomplete partial keys', such as 'nou'
  • How to ensure setting values affects the original dictionary

Proposal 2: Just use JSON module, reduce functionality

Since choices files can be represented in the JSON format, we could significantly reduce the code we have to maintain by using Python's json module to decode them. This would result in structures similar to what we have now, but would not allow complex key retrievals. Also, lists would use 0-indexing. For instance:

   1 >>> import json
   2 >>> choices = json.load(open('test/choices.json'))
   3 >>> print choices['noun'][0]['stem'][0]['orth']
   4 dog

Proposal 3: A mix of Proposals 1 & 2

I don't see any reason why we couldn't do both Proposals 1 & 2. Use the json module to load JSON-formatted choices files, then have some kind of wrapper that simulates the complex interactions. The implementer would still need to be mindful of performance, though.

Clone this wiki locally