Skip to content

Incorrect detection #17

@debuggio

Description

@debuggio

Hi guys!
I use py3langid==0.2.2 and I found that in some cases Chinese language has higher probability than it probably should be. For example

identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)
identifier.rank("Al furjan")

outputs:
[('zh', 0.24405981600284576), ('fi', 0.16715779900550842), ('mt', 0.1392195224761963), ('et', 0.10675894469022751), ('sl', 0.07787516713142395), ('en', 0.05285739526152611)......]

I understand that the text is quite short and it may return languages other that English, but Chinese?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions