forked from saffsd/langid.py
-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Hi guys!
I use py3langid==0.2.2 and I found that in some cases Chinese language has higher probability than it probably should be. For example
identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)
identifier.rank("Al furjan")
outputs:
[('zh', 0.24405981600284576), ('fi', 0.16715779900550842), ('mt', 0.1392195224761963), ('et', 0.10675894469022751), ('sl', 0.07787516713142395), ('en', 0.05285739526152611)......]
I understand that the text is quite short and it may return languages other that English, but Chinese?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working