Skip to content

Spanish Lemmatizer doesn't handle vosotros (2pl) #11607

Discussion options

You must be logged in to vote

I think it's an issue with the morphologizer rather that the rules in the lemmatizer. You can see that the MORPH tags are incorrect, and the lemmatizer uses the POS+MORPH tags to pick which rules to apply. (Also there is not a single occurrence of "vosotros" in the training data.) In all the cases above it looks like the verbs are tagged as Person=1 or Person=3 so the lemmatizer isn't applying the intended rules.

You can double-check the rules here:

https://github.com/explosion/spacy-lookups-data/blob/master/spacy_lookups_data/data/es_lemma_rules.json

Because there are very few 2nd person pronouns or verbs in the training data, the morphologizer does not learn how to tag them well. The ge…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@killia15
Comment options

@adrianeboyd
Comment options

@killia15
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / es Spanish language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization feat / morphologizer Feature: Morphologizer
2 participants
Converted from issue

This discussion was converted from issue #11606 on October 11, 2022 07:11.