Skip to content

Infinite applications of ProgrVP by ud2gf #12

Open
@inariksit

Description

@inariksit

I'm running ud2gf with ShallowParse, using "the cat sleeps" as my sentence. Here's the original sentence, produced with parsing "the cat sleeps" in UDpipe, and using this code to output the CoNLLU format.

$ cat /tmp/cat.conllu
1       the     the     DET     _       _       2       det     _       _
2       cat     cat     NOUN    _       _       3       nsubj   _       _
3       sleeps  sleep   VERB    _       _       0       root    _       _

I run ud2gf as follows.

$ cat /tmp/cat.conllu | stack run gf-ud ud2gf grammars/ShallowParse Eng Text at

Infinite loop

First, ud2gf ran for 30 minutes until I stopped it.

Uncomment "beam size" of 123 trees

Next, I uncommented this line, to put back the limitation of max 123 candidate trees. This works, in the sense that ud2gf doesn't get stuck in an infinite loop anymore, but the best tree still contains multiple applications of ProgrVP—despite the original sentence having none. Here's the output:

# bt0, the best (most complete) tree, without backups:
[3] sleeps 3 (2) VERB root (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))) : Imp[3]) 1
    *[1,2] cat 2 (1) NOUN nsubj (UseN cat_N : CN[2]) 1
        *[1] the 1 (2) DET det (the_Det : Det[1]) 1

# at, final GF tree, macros expanded:
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (DetBackup the_Det) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))))

Adding annotations to the conllu file

I have noticed before that I get weird trees if the file is missing morphological annotations. So I added them manually to the CoNLLU file:

$ cat /tmp/cat-annotated.conllu
1	the	the	DET	Det	FORM=0	2	det	_	_
2	cat	cat	NOUN	N	Number=Sing	3	nsubj	_	_
3	sleeps	sleep	VERB	V	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	0	root	_	_

With this file, we now get a correct tree with MiniLang:

# MiniLang with cat.conllu (which is missing annotations)
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (TheBackup the_The) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (UseV sleep_V))

# MiniLang with cat-annotated.conllu
PredVP (DetCN the_Det (UseN cat_N)) (UseV sleep_V)

But with ShallowParse, the tree is as wrong as ever, with multiple ProgrVPs.

# ShallowParse with cat-annotated.conllu
AddBackupImp (ConsBackup (CNBackup (AddBackupCN (ConsBackup (DetBackup thePl_Det) BaseBackup) (UseN cat_N))) BaseBackup) (ImpVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (ProgrVP (UseV sleep_V))))))))))))))))))))))

So it seems unlikely that the ProgrVP loop is due to user error/insufficiently annotated CoNLLU files.

Workaround

ProgrVP is the only function in ShallowParse of type a -> a, so I can just comment it out in the GF grammar. But of course, sometimes such functions are actually needed, so this is not a real solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions