-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single word greeting detection issue #222
Comments
I know about this problem. The current purely statistical approach does not produce good results for such short words. My plan is to include word lists for each language which contain greetings among other things. Greetings such as "Hi", however, are surely used in a lot of languages. Even if the library classifies it as English, it will not necessarily be an indicator for an English speaking customer in your chat. Please keep this in mind. |
Thanks @pemistahl |
@pemistahl I don't know if this is related, but.. we process both short and long texts, and with short texts we observed a weird behaviour with some language combinations. I don't recall the exact sentence we stumbled upon, but it behaved like this:
|
Based on the reported graphs I was expecting a high single-word detection accuracy, however when I tested some simple greetings, results were quite poor.
I'm thinking that I might have done something wrong, so let me know if it's the case, or maybe it is indeed a bug.
I was expecting [English, English, Dutch (although questionable), French, Spanish]
And if I look at the list of confidences, the correct answer is not even close to the top
In general, why I think this is important, is because it makes it impossible to use this detector in a multilingual chatbot scenario, where you have to determine a language in the beginning of the chat and change behaviour depending on that detection (i.e. say supported or unsupported, change available intents, etc. )
The text was updated successfully, but these errors were encountered: