- When trying to detect multiple languages in a text consisting of only a single word, a panic occurred. This has been fixed. (#41)
- For long input texts, a panic occurred while computing the confidence values due to an accidental division by zero. This has been fixed. (#27)
- After applying some internal optimizations, language detection is now faster, at least between 20% and 30%, approximately. For long input texts, the speed improvement is greater than for short input texts.
- For long input texts, an error occurred while computing the confidence values due to numerical underflow when converting probabilities. This has been fixed.
- The min-max normalization method for the confidence values has been replaced with applying the softmax function. This gives more realistic probabilities. (#25)
- Under certain circumstances, calling the method
LanguageDetector.DetectMultipleLanguagesOf()
caused an index error. This has been fixed.
- A misconfiguration in a
go.mod
file caused errors when trying to download the library via thego get
command. This has been fixed. (#23)
- The new method
LanguageDetector.DetectMultipleLanguagesOf()
has been introduced. It allows to detect multiple languages in mixed-language text. (#9)
- Some documentation mistakes have been fixed and missing information has been added.
-
The new method
LanguageDetectorBuilder.WithLowAccuracyMode()
has been introduced. By activating it, detection accuracy for short text is reduced in favor of a smaller memory footprint and faster detection performance. (#17) -
The new method
LanguageDetector.ComputeLanguageConfidence()
has been introduced. It allows to retrieve the confidence value for one specific language only, given the input text. (#19)
-
The computation of the confidence values has been revised and the min-max normalization algorithm is now applied to the values, making them better comparable by behaving more like real probabilities. (#16)
-
The language models are now serialized as protocol buffers instead of json. Thanks to this change, they are now loaded into memory twice as fast as before. (#22)
- The unigram counts in the statistics engine were not retrieved correctly. This has been fixed, producing more correct detection results. (#14)
- The lowest supported Go version is 1.18 now. Older versions are no longer compatible with this library.
- The library now has a fresh and colorful new logo. Why? Well, why not? (-:
- The character â was erroneously not treated as a possible indicator for French.
- The dependencies to the other language detectors which are used for
the accuracy comparisons were always downloaded together with the main
library. They are only needed when you want to update the accuracy reports,
therefore the
cmd/
subdirectory now contains its own Go module that defines those dependencies. They have now been removed from the main library. Thanks to @dim and @BoeingX for identifying this problem. (#8)
- It was possible to include
lingua.Unknown
in the set of input languages for building the language detector. It is only meant as a return value, so it is now automatically removed from the set of input languages. Thanks to @marians for identifying this problem. (#7)
- By replacing sync.Once with sync.Map for storing the language models at runtime, a large amount of code could be removed while preserving the same functionality. This improves code maintenance significantly.
- In very rare cases, the language returned by the detector was non-deterministic. This has been fixed. Big thanks to @FilipAlexander for identifying this problem. (#6)
- The language models were not embedded into the compiled binary. This resulted in problems when trying to use Lingua within a Docker container, for instance. Big thanks to @dsxack for identifying this problem and providing a fix. (#2 #3)
This is the very first release of the Go implementation of Lingua. Enjoy! :-)