Skip to content

Improve MWE annotation in Czech #313

@dan-zeman

Description

@dan-zeman

As suggested by @e-bej, the tectogrammatical layer of PDT (although it is not available for all sentences) may contain additional information about MWEs and related stuff. Groups of nodes that form multi-word expressions are described in the t-root nodes. Some of them are idioms like natáhnout bačkory “to kick the bucket” and we do not want to annotate them in the current surface layer of UD. Some of them are inherently reflexive verbs such as smát se, that are considered compound units in the Czech grammar, but in UD we follow the current guideline that reflexive pronouns should be attached as expl rather than compound. However, there may be other, more frozen expressions such as křížem krážem, which we will want to annotate using mwe.

Furthermore, there is the is_personal_name attribute that may be more reliable than the lemma tags we currently read when distinguishing PROPN from NOUN.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions