-
Notifications
You must be signed in to change notification settings - Fork 32
Rake.split_sentences(text) uses 'u' as separator #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That just means the strings are being represented as unicode strings. '' is an ascii string in python 2.7 and u'' is a unicode string. They work the same as normal strings, details here: That idiosyncrasy is one of the thing's cleaned up in python 3.x by the way, and one major reason it's recommended to use instead of python 2.7. I used unicode strings specifically because they're more robust and notably support more languages, and this is a multilingual library. Tell me if these are actually causing problems for you, but they shouldn't. Closed. |
Thank you for you reply. |
Got the same Problem - 'restaurant' is being split into 'resta' and 'rant'... |
The regex-string is not in Unicode, thus the \u... control sequence does have unexpected behaviour. Just try split_sentences("restaurant"), it will return ["resta", "rant"], which is obviously bad. Adding a simple u to the Regex, will force python to interpret it in unicode and fix this issue. Tested with python2.7
Hello,
I met an issue that split_sentences(text) function uses 'u' as separator. For instance
text: "is an incredibly popular library and for good reason it s powerful fast"
sentences list: [u'is an incredibly pop', u'lar library and for good reason it s powerf', u'l fast']
Definitely I can fix it at my environment, but I wonder what I did wrong and why nobody met this issue before?
My environment is python 2.7, python-rake is installed with pip.
The text was updated successfully, but these errors were encountered: