Skip to content

Meaningful confidence scores #110

Answered by pemistahl
n-splv asked this question in Q&A
Jan 9, 2023 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

Hi @nick-maykr, thank you for your questions and your interest in my library.

  1. You are not using the latest Lingua release 1.3. I have reworked the confidence score calculation. For your code above, the score for English is now 0.72 and for German 0.28. If you build the detector from these two languages only, it does not know anything about the existence of the other languages. That's why the probability for English is not as low as you would expect. When building the detector from all languages, the probability for English is reduced to 0.003 and for German to 0.001. For Italian, it would be 0.78. So the confidence score for a single language is always calculated relatively to the score…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@n-splv
Comment options

@pemistahl
Comment options

Answer selected by n-splv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants