Add support for numeric strings without spaces #19

gunthercox · 2017-12-20T12:37:53Z

The tokenizer currently splits math tokens based on whitespace.

In Mandarin 二加二 means "'two plus two" but no spaces are used between the characters.

The tokenizer needs to be modified so that it will split these values correctly.

I believe the optimal solution might be to have the tokenizer ignore whitespace and traverse the values in the string based on the type they get recognized as. So, in this case, the first value would be read 二 and added to the stack of numbers, then the next value would be read 加 and added to the binary operator stack, and so on. No need to split on whitespace.

Previously reported as gunthercox/ChatterBot#1115

The text was updated successfully, but these errors were encountered:

gunthercox · 2017-12-20T12:49:36Z

@xiaolitang I have a few questions to try to make sure that my proposed solution will work.

Can you give me an example of the following math statements in Mandarin? I'd like to see what they look like with unary operators.

Negative two plus two:

-2 + 2

Two raised to an exponent of two
2²

xiaolitang · 2017-12-20T13:14:34Z

Hello gunthercox, Yes! I would love to help!

…

-2 + 2 in Mandarin statement could be ‘负二加二’. ‘负’ means ‘negative’, ‘二’ means 2 and ‘加’ means ‘plus’. The Mandarin language has almost the same grammar/order as English when stating a mathematical statement. For the basic four operators: plus, minus, multiplication and division, statements in Mandarin are always like ‘a (positive/negative) number’ + ‘operator’ + ‘a (positive/negative) number’, similar to English. 22 in Mandarin could be ‘二的平方’. ‘平方’ means ‘squares’, and ‘的’ is a necessary character that could be interpreted as ‘whose’. For exponential statements in Mandarin, there is a general rule: 22 —> ‘二的平方’ 32 —> ‘三的平方’ 42 —> ‘四的平方’ Just replace the base number. Similarly for exponent of three: 2^3 —> ‘二的立方’ 3^3 —> ‘三的立方’ 4^3 —> ‘四的立方’ where ‘立方’ is a specific word for exponent of three. But it becomes more regular for the exponent greater or equal to four. It would be: 2^4 —> ‘二的四次方’ where ‘四’ means 4 2^5 —> ‘三的五次方’ where ‘五’ means 5 2^10 —> ‘四的十次方’ where ’十’ means 10 Basically the rule becomes ’base number’s exponent of n’. So actually only the exponent of 2 and 3 have their specific name(‘平方’,’立方’ respectively). Hope that helps! If theres anything unclear, please feel free to email me! By the way, your work on Chatterbot is awesome! Thank you very much for sharing the code and keeping updating! Thanks, xiaolitang On 20 Dec 2017, at 11:49 pm, Gunther Cox <[email protected]<mailto:[email protected]>> wrote: @xiaolitang<https://github.com/xiaolitang> I have a few questions to try to make sure that my proposed solution will work. Can you give me an example of the following math statements in Mandarin? I'd like to see what they look like with unary operators. Negative two plus two:

-2 + 2 Two raised to an exponent of two 22 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#19 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AYvZPM6jvvInR8UKHZ6WCRHflo0PycyHks5tCQJhgaJpZM4RIVO->.

gunthercox mentioned this issue Dec 20, 2017

Changing default language of the math evaluation logic adapter gunthercox/ChatterBot#1115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for numeric strings without spaces #19

Add support for numeric strings without spaces #19

gunthercox commented Dec 20, 2017 •

edited

Loading

gunthercox commented Dec 20, 2017

xiaolitang commented Dec 20, 2017 via email

Add support for numeric strings without spaces #19

Add support for numeric strings without spaces #19

Comments

gunthercox commented Dec 20, 2017 • edited Loading

gunthercox commented Dec 20, 2017

xiaolitang commented Dec 20, 2017 via email

gunthercox commented Dec 20, 2017 •

edited

Loading