Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: handle compounds in lemma #37

Open
inariksit opened this issue Apr 22, 2022 · 0 comments
Open

Feature request: handle compounds in lemma #37

inariksit opened this issue Apr 22, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@inariksit
Copy link
Member

An example input:

1       Sapiteed        sapi_tee        PROPN   S       Case=Par|Number=Sing    0       root    _       _
2       tavalaiusega    tavalaius       NOUN    S       Case=Com|Number=Sing    1       nmod    _       _

I would like ud2gf to try to parse sapi_tee in the following order:

a. Merge the lemma into sapitee and try to parse it. If it is found in the lexicon, return sapitee_N.
b. If sapitee is not in the lexicon, then try parsing both sapi and tee. If they are both nouns, return CompoundN sapi_N tee_N.
c. If only tee is found in the lexicon, return StrCompoundN "sapi" tee_N.
d. If none of sapi or tee is in the lexicon, then proceed to morpho_analyze the wordform, i.e. "sapiteed". That's because the lemma may have been wrongly analysed.
f. If ma "sapiteed"didn't return anything either, as a last resort we return StrN <something>. That something can be

  • lemma without the underscore, so StrN "sapitee"
  • wordform as is, so StrN "sapiteed".

The same applies for compound adjectives, verbs etc. This assumes that the grammar has the backup functions StrC and StrCompoundC (which may become a command line option, see #24. But for now, when it's not command line option, we can just introduce those functions in ud2gf, and leave it to the grammarian to add them to grammar.)

Interaction with morpho_analyse

As of April 2022, ud2gf first tries to parse the lemma, and only secondarily does ma on the word form. If the default behaviour changes, this proposed algorithm should be reconsidered too.

@inariksit inariksit added the enhancement New feature or request label Apr 22, 2022
anka-213 added a commit to anka-213/gf-ud that referenced this issue May 4, 2022
Feature request: handle compounds in lemma
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant