-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for numeric strings without spaces #19
Comments
@xiaolitang I have a few questions to try to make sure that my proposed solution will work. Can you give me an example of the following math statements in Mandarin? I'd like to see what they look like with unary operators. Negative two plus two: -2 + 2 Two raised to an exponent of two |
Hello gunthercox,
Yes! I would love to help!
…-2 + 2 in Mandarin statement could be ‘负二加二’.
‘负’ means ‘negative’, ‘二’ means 2 and ‘加’ means ‘plus’. The Mandarin language has almost the same grammar/order as English when stating a mathematical statement.
For the basic four operators: plus, minus, multiplication and division, statements in Mandarin are always like ‘a (positive/negative) number’ + ‘operator’ + ‘a (positive/negative) number’, similar to English.
22 in Mandarin could be ‘二的平方’.
‘平方’ means ‘squares’, and ‘的’ is a necessary character that could be interpreted as ‘whose’.
For exponential statements in Mandarin, there is a general rule:
22 —> ‘二的平方’
32 —> ‘三的平方’
42 —> ‘四的平方’
Just replace the base number.
Similarly for exponent of three:
2^3 —> ‘二的立方’
3^3 —> ‘三的立方’
4^3 —> ‘四的立方’
where ‘立方’ is a specific word for exponent of three.
But it becomes more regular for the exponent greater or equal to four.
It would be:
2^4 —> ‘二的四次方’ where ‘四’ means 4
2^5 —> ‘三的五次方’ where ‘五’ means 5
2^10 —> ‘四的十次方’ where ’十’ means 10
Basically the rule becomes ’base number’s exponent of n’.
So actually only the exponent of 2 and 3 have their specific name(‘平方’,’立方’ respectively).
Hope that helps!
If theres anything unclear, please feel free to email me!
By the way, your work on Chatterbot is awesome!
Thank you very much for sharing the code and keeping updating!
Thanks,
xiaolitang
On 20 Dec 2017, at 11:49 pm, Gunther Cox <[email protected]<mailto:[email protected]>> wrote:
@xiaolitang<https://github.com/xiaolitang> I have a few questions to try to make sure that my proposed solution will work.
Can you give me an example of the following math statements in Mandarin? I'd like to see what they look like with unary operators.
Negative two plus two:
-2 + 2
Two raised to an exponent of two
22
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#19 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AYvZPM6jvvInR8UKHZ6WCRHflo0PycyHks5tCQJhgaJpZM4RIVO->.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The tokenizer currently splits math tokens based on whitespace.
In Mandarin
二加二
means "'two plus two" but no spaces are used between the characters.The tokenizer needs to be modified so that it will split these values correctly.
I believe the optimal solution might be to have the tokenizer ignore whitespace and traverse the values in the string based on the type they get recognized as. So, in this case, the first value would be read
二
and added to the stack of numbers, then the next value would be read加
and added to the binary operator stack, and so on. No need to split on whitespace.Previously reported as gunthercox/ChatterBot#1115
The text was updated successfully, but these errors were encountered: