Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Luxembourgish #155

Open
astuanax opened this issue Jul 28, 2023 · 7 comments
Open

Luxembourgish #155

astuanax opened this issue Jul 28, 2023 · 7 comments

Comments

@astuanax
Copy link

astuanax commented Jul 28, 2023

Would it be possible to include Luxembourgish?

I believe 2 EU langauges are missing from the list: Maltese and Luxembourgish.

It seems Thierry Goeckel already build luxdetection, but maybe we can integrate this?
https://github.com/rotzbouw/luxdetect

Would be happy to discuss how to go forward and help out.

@pemistahl
Copy link
Owner

Hi @astuanax, thanks for your request.

I'm planning to add 25 more languages to Lingua so that it supports a total of 100 languages then. I'm pretty sure that Maltese and Luxembourgish will be among those new languages. It may take a while, however.

Before starting that, I will first evaluate whether it's possible to use the Rust port of Lingua within Python because the pure Python port is actually very slow. The Rust port is significantly faster.

@astuanax
Copy link
Author

astuanax commented Aug 9, 2023

Sure, I understand, let me know if I can help with testing.

@TomLucidor
Copy link

@pemistahl can ML libraries accelerate Python's performance?

@pemistahl
Copy link
Owner

@TomLucidor I'm currently writing Python bindings for the Rust implementation which will eventually replace the pure Python implementation. This will solve most performance issues.

@Mejans
Copy link

Mejans commented Nov 6, 2023

Hello @pemistahl
will there be Occitan and Kabyle languages in your 100 new supported languages?
Best regards

@pemistahl
Copy link
Owner

Hi @Mejans, I won't add a set of 100 new languages. I was talking about 25 new languages. That's far enough work for now.

I haven't decided yet which languages to include but I'm in favor of including some minority languages as well. So thank you for proposing Occitan and Kabyle. I will keep them in mind.

@TomLucidor
Copy link

@Mejans do you have data on high resource vs low resource languages? There is this paper "Language Resource Distribution in NLP" that did not list the level 5, level 4, and level 3 languages... would like to know which ones are good enough to note

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants