-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
If we are going to use FastText, we should be applying lowercase before language identification. At least in the official lid.175 model, uppercased text completely messes up the identification for mid/low-resource languages, always identifying them as the highest resource language of the script (Russian for cyrillic, English/Spanish/French for latin).
Metadata
Metadata
Assignees
Labels
No labels