Rake.split_sentences(text) uses 'u' as separator #30

xyutech · 2017-11-10T12:03:58Z

Hello,
I met an issue that split_sentences(text) function uses 'u' as separator. For instance
text: "is an incredibly popular library and for good reason it s powerful fast"
sentences list: [u'is an incredibly pop', u'lar library and for good reason it s powerf', u'l fast']
Definitely I can fix it at my environment, but I wonder what I did wrong and why nobody met this issue before?
My environment is python 2.7, python-rake is installed with pip.

jkterry1 · 2017-11-11T04:04:24Z

That just means the strings are being represented as unicode strings. '' is an ascii string in python 2.7 and u'' is a unicode string. They work the same as normal strings, details here:
https://docs.python.org/2/howto/unicode.html

That idiosyncrasy is one of the thing's cleaned up in python 3.x by the way, and one major reason it's recommended to use instead of python 2.7. I used unicode strings specifically because they're more robust and notably support more languages, and this is a multilingual library. Tell me if these are actually causing problems for you, but they shouldn't. Closed.

xyutech · 2017-11-11T19:02:04Z

Thank you for you reply.
Just let me add some more info to make sure that we are on the same page. I did not tell about notation
u'is an incredibly pop'
It is clear. My issue was about input string was separated by 'u'. So input is:
is an incredibly popular library and for good reason it s powerful fast
and separation is
is an incredibly pop | lar library and for good reason it s powerf | l fast

klockeph · 2017-11-12T00:05:46Z

Got the same Problem - 'restaurant' is being split into 'resta' and 'rant'...

The regex-string is not in Unicode, thus the \u... control sequence does have unexpected behaviour. Just try split_sentences("restaurant"), it will return ["resta", "rant"], which is obviously bad. Adding a simple u to the Regex, will force python to interpret it in unicode and fix this issue. Tested with python2.7

jkterry1 closed this as completed Nov 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rake.split_sentences(text) uses 'u' as separator #30

Rake.split_sentences(text) uses 'u' as separator #30

xyutech commented Nov 10, 2017

jkterry1 commented Nov 11, 2017

xyutech commented Nov 11, 2017

klockeph commented Nov 12, 2017

Rake.split_sentences(text) uses 'u' as separator #30

Rake.split_sentences(text) uses 'u' as separator #30

Comments

xyutech commented Nov 10, 2017

jkterry1 commented Nov 11, 2017

xyutech commented Nov 11, 2017

klockeph commented Nov 12, 2017