How to boot-strap GF development for non-resourced languages alternative title Boot-strap GF morphology and start directly on concrete syntax Kristian Kankainen MTÜ Keeleleek
Q: How to start with GF when the only thing I have is full inflection tables for 5 nouns?
A: Automatically generate all GF code by showing the tables to the machine!
So my project was to make the converter from inflection tables into GF code.
Illustration of my general work-flow:
LMF
(extensional
morphology)
|
inflection
table --> pextract --> paradigm
description
|
DFDL
|
XML ---> GF
|
LMF
(intensional
morphology)
Practical implementation of the above work-flow:
- Take my inflection tables and extract the paradigms using the paradigm extraction tool pextract
- Make a DFDL schema that describes the output of pextract (see dfdl-pextract-schema)
- Make a converter that generates the GF code from the pextract output (see pextract2gf)
- Make sure the converted code runs
- Ask Inari for the rest of the code for a mini resource grammar
- Use pextract2gf and copy-paste the generated code into my mini grammar
- Have dinner and go home for further work
- ... resurrect Votic language by showing cool GF application grammars for the grandchildren of the last 10 Votic people
Show automatically generated GF code here
- lemma-agnostic (probably
Sg Par
is best but I need to check statistics) - further lexicon building á la Morfologilabbet
- new GF users could boot-strap the morphology module and move directly to concrete syntax of a language
- GF pattern matching is unexpectedly non-greedy (see Issue #3)
- Make pextract2gf code readable and understandable
- Add link and short explanation in GF-contrib repository
- Variant forms are now effectively ignored (but still included in the serialization)
- Use function for switching between bind and string concatenation
- Add verbs and find more problems
- ... mostly in the type system which seems semi fixed
- Work on paradigm prediction
- serialize a Smart Paradigm from pextract paradigm description
- make the Smart Paradigm smarter
- but on the more abstract level using Lexical Markup Framework
- work on paradigm minimization (to minimize GF memory)
- ... move on to Votic syntax