Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect inflections of special adjectives like beautiful and handsome #6

Open
OscarWang114 opened this issue Sep 19, 2020 · 2 comments
Labels
enhancement New feature or request

Comments

@OscarWang114
Copy link

Hi, thanks for building this amazing tool!
Currently, it doesn't seem to handle inflections of special adjectives like beautiful and handsome correctly.

Example:

from lemminflect import getLemma, getInflection

lemma = getLemma('beautiful', upos='ADJ')
inflection1 = getInflection(lemma[0], tag='JJR')
inflection2 = getInflection(lemma[0], tag='JJS')
print(inflection1, inflection2)

gives ('beautifuler',) and ('beautifulest',). It'd be great if lemminflect can output something like ('more', 'beautiful',) or ('more beautiful',)!

@bjascob
Copy link
Owner

bjascob commented Sep 19, 2020

Thanks for pointing this out.

What's happening is it doesn't have an inflection in its dictionary for JJR/JJS so it's using the out-of-vocabulary rules to create one.
You can see this if you do...

lemminflect.Inflections().getAllInflections(lemma[0])
{'JJ': ('beautiful',)}

Essentially, you're asking it to do something that isn't correct for English and it doesn't know that this isn't allowed, or at least isn't going to try to stop you.

I could probably add a rule prevent it from creating an inflection if it has the base lemma but not the specific inflection (or at least log a warning). However, I'm a little concerned that there might be instances where it only has the base form and falling back to the OOV rules for inflection allow things to work correctly for the user.

The right way to do this would be to have a defined list or set of rules for these exceptions and implement a lookup for them. I can look in the base NIH lexicon to see if there's anything that would with that. If you're aware of any resource that details this behavior, let me know. I'll have to look into this some more.

@bjascob bjascob added the enhancement New feature or request label Sep 19, 2020
@nihil-admirari
Copy link

nihil-admirari commented May 27, 2023

At least one exception is handled incorrectly:

In [1]: getInflection('little', 'JJR')
Out[1]: ('littler',)  # should be less

In [2]: getInflection('little', 'JJS')
Out[2]: ('littlest',)  # should be least

Some adjectives don't have comparative or superlative forms at all, not even more/most:

In [3]: getInflection('alphanumeric', 'JJR')
Out[3]: ('alphanumericer',)

In [4]: getInflection('alphanumeric', 'JJS')
Out[4]: ('alphanumericest',)

Simple Wiktionary has a list of them: https://simple.wiktionary.org/wiki/Category:Non-comparable_adjectives; not sure whether it's exhaustive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants