-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
I am trying to extract keywords from amazon_reviews dataset, when using it for spanish i encounter this error that am unable to resolve.
STACK TRACE
/python3.8/site-packages/multi_rake/algorithm.py in apply(self, text, text_for_stopwords)
60
61 else:
---> 62 language_code = detect_language(text, self.lang_detect_threshold)
63
64 if language_code is not None and language_code in STOPWORDS:
/opt/conda/lib/python3.8/site-packages/multi_rake/utils.py in detect_language(text, proba_threshold)
12
13 def detect_language(text, proba_threshold):
---> 14 _, _, details = pycld2.detect(text)
15
16 language_code = details[0][1]
error: input contains invalid UTF-8 around byte 2094 (of 5341)
Is there a workaround by manually entering Language code or something ?
7homasSutter
Metadata
Metadata
Assignees
Labels
No labels