Description
I'm running ud2gf with ShallowParse, using "the cat sleeps" as my sentence. Here's the original sentence, produced with parsing "the cat sleeps" in UDpipe, and using this code to output the CoNLLU format.
$ cat /tmp/cat.conllu
1 the the DET _ _ 2 det _ _
2 cat cat NOUN _ _ 3 nsubj _ _
3 sleeps sleep VERB _ _ 0 root _ _
I run ud2gf as follows.
$ cat /tmp/cat.conllu | stack run gf-ud ud2gf grammars/ShallowParse Eng Text at
Infinite loop
First, ud2gf ran for 30 minutes until I stopped it.
Uncomment "beam size" of 123 trees
Next, I uncommented this line, to put back the limitation of max 123 candidate trees. This works, in the sense that ud2gf doesn't get stuck in an infinite loop anymore, but the best tree still contains multiple applications of ProgrVP—despite the original sentence having none. Here's the output:
# bt0, the best (most complete) tree, without backups:
[3] sleeps 3 (2) VERB root (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))) : Imp[3]) 1
*[1,2] cat 2 (1) NOUN nsubj (UseN cat_N : CN[2]) 1
*[1] the 1 (2) DET det (the_Det : Det[1]) 1
# at, final GF tree, macros expanded:
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (DetBackup the_Det) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))))
Adding annotations to the conllu file
I have noticed before that I get weird trees if the file is missing morphological annotations. So I added them manually to the CoNLLU file:
$ cat /tmp/cat-annotated.conllu
1 the the DET Det FORM=0 2 det _ _
2 cat cat NOUN N Number=Sing 3 nsubj _ _
3 sleeps sleep VERB V Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _
With this file, we now get a correct tree with MiniLang:
# MiniLang with cat.conllu (which is missing annotations)
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (TheBackup the_The) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (UseV sleep_V))
# MiniLang with cat-annotated.conllu
PredVP (DetCN the_Det (UseN cat_N)) (UseV sleep_V)
But with ShallowParse, the tree is as wrong as ever, with multiple ProgrVPs.
# ShallowParse with cat-annotated.conllu
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (DetBackup thePl_Det) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))))
So it seems unlikely that the ProgrVP loop is due to user error/insufficiently annotated CoNLLU files.
Workaround
ProgrVP
is the only function in ShallowParse of type a -> a
, so I can just comment it out in the GF grammar. But of course, sometimes such functions are actually needed, so this is not a real solution.