Skip to content

Latest commit

 

History

History
81 lines (59 loc) · 3.14 KB

GFSS2017-project-presentation.md

File metadata and controls

81 lines (59 loc) · 3.14 KB

Project presentation GF Summer School 2017

How to boot-strap GF development for non-resourced languages alternative title Boot-strap GF morphology and start directly on concrete syntax Kristian Kankainen MTÜ Keeleleek

Project background

Q: How to start with GF when the only thing I have is full inflection tables for 5 nouns?

A: Automatically generate all GF code by showing the tables to the machine!

So my project was to make the converter from inflection tables into GF code.

General work-flow:

Illustration of my general work-flow:

      LMF
  (extensional
   morphology)
       |
   inflection
     table --> pextract --> paradigm
                          description
                               |
                              DFDL
                               |
                              XML ---> GF
                               |
                              LMF
                         (intensional
                          morphology)

Practical implementation of the above work-flow:

  1. Take my inflection tables and extract the paradigms using the paradigm extraction tool pextract
  2. Make a DFDL schema that describes the output of pextract (see dfdl-pextract-schema)
  3. Make a converter that generates the GF code from the pextract output (see pextract2gf)
  4. Make sure the converted code runs
  5. Ask Inari for the rest of the code for a mini resource grammar
  6. Use pextract2gf and copy-paste the generated code into my mini grammar
  7. Have dinner and go home for further work
  8. ... resurrect Votic language by showing cool GF application grammars for the grandchildren of the last 10 Votic people

Show automatically generated GF code here

pextract2gf

Good things

  • lemma-agnostic (probably Sg Par is best but I need to check statistics)
  • further lexicon building á la Morfologilabbet
  • new GF users could boot-strap the morphology module and move directly to concrete syntax of a language

Problems

  • GF pattern matching is unexpectedly non-greedy (see Issue #3)

To-do

  • Make pextract2gf code readable and understandable
  • Add link and short explanation in GF-contrib repository
  • Variant forms are now effectively ignored (but still included in the serialization)
  • Use function for switching between bind and string concatenation
  • Add verbs and find more problems
    • ... mostly in the type system which seems semi fixed
  • Work on paradigm prediction
    • serialize a Smart Paradigm from pextract paradigm description
    • make the Smart Paradigm smarter
  • work on paradigm minimization (to minimize GF memory)
  • ... move on to Votic syntax