-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overwhelming number of synonyms #13
Comments
I originally had a way to filter by source, but I removed it from the code, because it caused confusion for some of my beta users. Synonyms are enriched from 4 sources. All these sources have been validated in testing. Concerning the word "bully." It is a synonyms of good and is being pulled from synonym.com. Here is a reference from the Oxford Languages dictionary: Here is a reference from the Oxford Languages dictionary for the word "spanking," which is also a synonyms of good. |
Thank you for the detailed and quick response, I am not sure why filtering by source was confusing. From my experience, allowing developers a wide range of options and possibilities, if implemented correctly, should not negatively affect the developer's experience. With such a wide range of synonyms, having no sense of control over them is a double edge sword. I would suggest developing a solution that will provide an indication of whether a word source is informal/slang. In my opinion, filtering slang/informal cases is a common practice. |
Do you know the level of difficulty for developing a solution that tries to classify words by formal and informal/slang? I'm not sure this is even possible without creating a backend data source that contains these relationship. Do you have any suggestions on how to develop this solution for the English language? |
The data can be found in the source website themselves. As part of the crawling you can collect the tags of the source of the words from each website (both bully and spanking have informal tag). I have implemented this method while doing crawling on wiktionary, filtering slang/informal definitions of idioms. |
I'm sorry, but your statement is not correct, because WordHoard does not pull from the Oxford Languages dictionary. I checked the sources that are queried by WordHoard and none of them provide this data. I could add a module to queried the Oxford dictionary, but this source requires a subscription to use. And querying Google search for this information will lead to captchas being thrown. |
@Lampent I recently spent some time redesigning WordHoard to allow searching by individual sources. Please let me know if this redesign works better for you. |
Hello,
Thank you for publishing this package. It is a highly beneficial resource.
When searching for synonyms, I noticed an unexpected behavior (bug).
For the word "good", the function
find_synonyms()
returns a list of 104 unique words. Among them are words that are not synonyms for the word "good." For example, "bully", "cracking", "bad", "boss", "hard", "spanking", and a couple of additional words that I am not sure if they are synonyms or not. The behavior is repeated with other words as well.I am unsure if there is a specific website that enriches the synonyms with such words or if it is a bug in the crawling process. A possible solution may be to allow the selection of the websites on which the crawling process takes place.
I would highly recommend this option since I am unsure about the legitimacy of the other sources except for "merriam-webster" and "wordnet".
To date, I have decided to take the synonyms directly from "wordnet", as I cannot guarantee they are actually synonyms.
The text was updated successfully, but these errors were encountered: