You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1 Sapiteed sapi_tee PROPN S Case=Par|Number=Sing 0 root _ _
2 tavalaiusega tavalaius NOUN S Case=Com|Number=Sing 1 nmod _ _
I would like ud2gf to try to parse sapi_tee in the following order:
a. Merge the lemma into sapitee and try to parse it. If it is found in the lexicon, return sapitee_N.
b. If sapitee is not in the lexicon, then try parsing both sapi and tee. If they are both nouns, return CompoundN sapi_N tee_N.
c. If only tee is found in the lexicon, return StrCompoundN "sapi" tee_N.
d. If none of sapi or tee is in the lexicon, then proceed to morpho_analyze the wordform, i.e. "sapiteed". That's because the lemma may have been wrongly analysed.
f. If ma "sapiteed"didn't return anything either, as a last resort we return StrN <something>. That something can be
lemma without the underscore, so StrN "sapitee"
wordform as is, so StrN "sapiteed".
The same applies for compound adjectives, verbs etc. This assumes that the grammar has the backup functions StrC and StrCompoundC (which may become a command line option, see #24. But for now, when it's not command line option, we can just introduce those functions in ud2gf, and leave it to the grammarian to add them to grammar.)
Interaction with morpho_analyse
As of April 2022, ud2gf first tries to parse the lemma, and only secondarily does ma on the word form. If the default behaviour changes, this proposed algorithm should be reconsidered too.
The text was updated successfully, but these errors were encountered:
An example input:
I would like ud2gf to try to parse
sapi_tee
in the following order:a. Merge the lemma into sapitee and try to parse it. If it is found in the lexicon, return
sapitee_N
.b. If sapitee is not in the lexicon, then try parsing both sapi and tee. If they are both nouns, return
CompoundN sapi_N tee_N
.c. If only tee is found in the lexicon, return
StrCompoundN "sapi" tee_N
.d. If none of sapi or tee is in the lexicon, then proceed to
morpho_analyze
the wordform, i.e. "sapiteed". That's because the lemma may have been wrongly analysed.f. If
ma "sapiteed"
didn't return anything either, as a last resort we returnStrN <something>
. That something can beStrN "sapitee"
StrN "sapiteed"
.The same applies for compound adjectives, verbs etc. This assumes that the grammar has the backup functions
StrC
andStrCompoundC
(which may become a command line option, see #24. But for now, when it's not command line option, we can just introduce those functions in ud2gf, and leave it to the grammarian to add them to grammar.)Interaction with morpho_analyse
As of April 2022, ud2gf first tries to parse the lemma, and only secondarily does
ma
on the word form. If the default behaviour changes, this proposed algorithm should be reconsidered too.The text was updated successfully, but these errors were encountered: