Skip to content

Lingua 1.4.0

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 29 Oct 13:27
· 14 commits to main since this release

Features

  • This release introduces an absolute confidence metric based on unique and most common ngrams for each supported language. It allows to build a language detector from a single language only. Such a detector serves as a binary classifier, telling you whether some text is written in your selected language or not. (#235)

Improvements

  • The new absolute confidence metric helps to improve accuracy in low accuracy mode. The mean of average detection accuracy (single words, word pairs and sentences combined) increases from 77% to 80%.

Bug Fixes

  • The tokenization of texts written in the Devanagari alphabet was flawed. This has been fixed, leading to better detection accuracy for Hindi and Marathi.

Compatibility

  • The newest Python 3.13 is now officially supported.
  • Support for Python 3.8 and 3.9 has been dropped. The lowest supported Python version is 3.10 now.

Please note: All new features and bug fixes will also be part of the next Rust-based Python extension release 2.1.0.