Skip to content

Commit

Permalink
update reports
Browse files Browse the repository at this point in the history
  • Loading branch information
Casxt committed Dec 12, 2023
1 parent b168501 commit 60a0ebc
Show file tree
Hide file tree
Showing 44 changed files with 190 additions and 125 deletions.
11 changes: 6 additions & 5 deletions cmd/accuracy-reports/aggregated-accuracy-values.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Afrikaans,51,21,39,92,55,22,46,98,64,38,62,93,79,58,81,97
Albanian,NaN,NaN,NaN,NaN,55,18,48,98,80,54,86,99,88,69,95,100
Arabic,89,77,91,99,90,79,92,100,94,88,96,99,98,96,99,100
Armenian,NaN,NaN,NaN,NaN,99,100,100,97,100,100,100,100,100,100,100,100
Azerbaijani,64,45,58,91,81,62,82,99,82,71,78,96,90,77,92,99
Azerbaijani,65,45,58,91,81,62,82,99,82,71,78,96,90,77,92,99
Basque,NaN,NaN,NaN,NaN,62,33,62,92,75,56,76,92,84,71,87,93
Belarusian,81,64,80,98,84,67,86,100,92,80,95,100,97,92,99,100
Bengali,100,100,100,100,99,98,99,99,100,100,100,100,100,100,100,100
Expand All @@ -16,7 +16,7 @@ Croatian,55,28,44,91,42,26,42,58,60,36,57,86,73,53,74,90
Czech,50,31,46,71,64,39,65,88,71,54,72,87,80,66,84,91
Danish,47,24,38,79,58,26,54,95,70,45,70,95,81,61,84,98
Dutch,47,22,36,82,58,29,47,97,64,36,61,94,77,55,81,96
English,49,17,35,94,54,22,44,97,63,29,62,97,81,55,89,99
English,49,18,35,94,54,22,44,97,63,29,62,97,81,55,89,99
Esperanto,52,25,45,88,57,22,51,98,66,44,61,93,84,67,85,98
Estonian,61,36,53,94,70,41,69,99,83,62,88,99,92,80,96,100
Finnish,71,45,70,98,80,58,84,99,91,77,95,100,96,90,98,100
Expand All @@ -27,7 +27,7 @@ German,65,38,60,97,66,40,62,98,80,57,84,99,89,74,94,100
Greek,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
Gujarati,100,100,100,100,100,99,100,100,100,100,100,100,100,100,100,100
Hebrew,90,76,94,99,NaN,NaN,NaN,NaN,100,100,100,100,100,100,100,100
Hindi,52,27,40,88,58,34,45,95,33,11,20,67,73,61,64,95
Hindi,52,26,40,88,58,34,45,95,33,11,20,67,73,61,64,95
Hungarian,62,37,53,95,76,53,76,99,90,77,94,100,95,87,98,100
Icelandic,NaN,NaN,NaN,NaN,71,42,70,99,88,72,92,99,93,83,97,100
Indonesian,67,39,66,95,46,26,45,66,47,25,46,71,61,39,61,83
Expand All @@ -39,7 +39,7 @@ Korean,100,100,100,100,99,100,100,98,100,100,100,100,100,100,100,100
Latin,NaN,NaN,NaN,NaN,62,44,58,83,73,49,76,94,87,72,93,97
Latvian,59,36,54,87,75,51,77,98,87,75,90,97,93,85,97,99
Lithuanian,62,38,56,92,72,42,75,99,87,76,89,98,95,86,98,100
Macedonian,62,39,55,94,60,30,54,97,72,52,70,95,84,66,86,99
Macedonian,62,39,54,94,60,30,54,97,72,52,70,95,84,66,86,99
Malay,NaN,NaN,NaN,NaN,22,11,22,34,31,22,36,35,31,26,38,28
Maori,NaN,NaN,NaN,NaN,52,22,43,91,82,62,87,98,91,82,92,99
Marathi,73,52,74,93,84,69,84,98,39,16,30,72,85,74,85,96
Expand All @@ -49,7 +49,7 @@ Persian,70,46,66,99,76,57,70,99,80,62,80,98,90,78,94,100
Polish,66,45,59,94,77,51,80,99,90,77,93,99,95,85,98,100
Portuguese,57,26,48,96,53,21,40,97,69,42,70,95,81,59,85,99
Punjabi,100,100,100,100,100,99,100,100,100,100,100,100,100,100,100,100
Romanian,59,34,52,90,53,24,48,88,72,49,74,94,87,69,92,99
Romanian,59,35,52,90,53,24,48,88,72,49,74,94,87,69,92,99
Russian,53,40,52,68,71,48,72,93,78,59,84,92,90,76,95,98
Serbian,57,34,51,86,78,63,75,95,78,62,80,91,88,74,90,99
Shona,68,44,65,95,76,51,79,99,81,56,86,100,91,78,96,100
Expand All @@ -74,3 +74,4 @@ Welsh,NaN,NaN,NaN,NaN,69,43,66,98,82,61,87,99,91,78,96,99
Xhosa,NaN,NaN,NaN,NaN,66,40,65,92,69,45,67,94,82,64,85,98
Yoruba,22,11,14,41,15,5,11,28,62,33,61,92,74,50,77,96
Zulu,70,44,68,98,63,35,63,92,70,45,72,94,81,62,83,97
Malayalam,100,100,100,100,99,99,100,100,43,23,38,69,100,100,100,99
16 changes: 16 additions & 0 deletions cmd/accuracy-reports/cld3/Malayalam.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Malayalam #####

>>> Accuracy on average: 99.47%

>> Detection of 1000 single words (average length: 10 chars)
Accuracy: 99.10%
Erroneously classified as Unknown: 0.40%, Yoruba: 0.30%, Finnish: 0.10%, Hungarian: 0.10%

>> Detection of 1000 word pairs (average length: 20 chars)
Accuracy: 99.80%
Erroneously classified as Marathi: 0.10%, Vietnamese: 0.10%

>> Detection of 1000 sentences (average length: 127 chars)
Accuracy: 99.50%
Erroneously classified as Bengali: 0.20%, Japanese: 0.10%, Marathi: 0.10%, Vietnamese: 0.10%

16 changes: 16 additions & 0 deletions cmd/accuracy-reports/lingua-high-accuracy/Malayalam.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Malayalam #####

>>> Accuracy on average: 99.80%

>> Detection of 1000 single words (average length: 10 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 word pairs (average length: 20 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 sentences (average length: 127 chars)
Accuracy: 99.40%
Erroneously classified as Unknown: 0.30%, Bengali: 0.20%, Arabic: 0.10%

16 changes: 16 additions & 0 deletions cmd/accuracy-reports/lingua-low-accuracy/Malayalam.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Malayalam #####

>>> Accuracy on average: 43.33%

>> Detection of 1000 single words (average length: 10 chars)
Accuracy: 22.70%
Erroneously classified as Unknown: 77.30%

>> Detection of 1000 word pairs (average length: 20 chars)
Accuracy: 37.90%
Erroneously classified as Unknown: 62.10%

>> Detection of 1000 sentences (average length: 127 chars)
Accuracy: 69.40%
Erroneously classified as Unknown: 30.30%, Bengali: 0.20%, Arabic: 0.10%

2 changes: 1 addition & 1 deletion cmd/accuracy-reports/whatlang/Afrikaans.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 21.00%
Erroneously classified as Unknown: 16.00%, Dutch: 10.30%, German: 7.00%, Danish: 5.70%, Bokmal: 4.20%, Estonian: 4.20%, Nynorsk: 3.40%, French: 3.20%, Swedish: 1.90%, Finnish: 1.80%, Turkish: 1.70%, Italian: 1.50%, Latvian: 1.50%, Romanian: 1.50%, Spanish: 1.50%, Portuguese: 1.40%, Somali: 1.30%, English: 1.20%, Hungarian: 1.20%, Indonesian: 1.20%, Shona: 1.00%, Slovene: 1.00%, Zulu: 0.90%, Esperanto: 0.80%, Lithuanian: 0.80%, Polish: 0.80%, Czech: 0.60%, Tagalog: 0.50%, Croatian: 0.40%, Vietnamese: 0.30%, Azerbaijani: 0.20%
Erroneously classified as Unknown: 16.00%, Dutch: 10.30%, German: 7.00%, Danish: 5.60%, Bokmal: 4.20%, Estonian: 4.20%, Nynorsk: 3.40%, French: 3.20%, Swedish: 1.90%, Finnish: 1.80%, Turkish: 1.70%, Italian: 1.50%, Latvian: 1.50%, Romanian: 1.50%, Spanish: 1.50%, Portuguese: 1.40%, Hungarian: 1.30%, Somali: 1.30%, English: 1.20%, Indonesian: 1.20%, Shona: 1.00%, Slovene: 1.00%, Zulu: 0.90%, Esperanto: 0.80%, Lithuanian: 0.80%, Polish: 0.80%, Czech: 0.60%, Tagalog: 0.50%, Croatian: 0.40%, Vietnamese: 0.30%, Azerbaijani: 0.20%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 39.30%
Expand Down
2 changes: 1 addition & 1 deletion cmd/accuracy-reports/whatlang/Arabic.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

>> Detection of 1000 single words (average length: 6 chars)
Accuracy: 77.30%
Erroneously classified as Unknown: 12.50%, Persian: 7.10%, Urdu: 3.10%
Erroneously classified as Unknown: 12.60%, Persian: 7.10%, Urdu: 3.00%

>> Detection of 1000 word pairs (average length: 14 chars)
Accuracy: 91.20%
Expand Down
10 changes: 5 additions & 5 deletions cmd/accuracy-reports/whatlang/Azerbaijani.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
##### Azerbaijani #####

>>> Accuracy on average: 64.50%
>>> Accuracy on average: 64.57%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 44.60%
Erroneously classified as Unknown: 24.00%, Turkish: 8.80%, Somali: 2.10%, Tagalog: 2.00%, Indonesian: 1.80%, Italian: 1.50%, Finnish: 1.40%, Croatian: 1.00%, French: 1.00%, Estonian: 0.90%, German: 0.90%, Lithuanian: 0.90%, Portuguese: 0.90%, Spanish: 0.90%, Afrikaans: 0.70%, English: 0.70%, Shona: 0.70%, Romanian: 0.60%, Zulu: 0.60%, Hungarian: 0.50%, Nynorsk: 0.50%, Swedish: 0.50%, Danish: 0.40%, Latvian: 0.40%, Slovene: 0.40%, Esperanto: 0.30%, Bokmal: 0.20%, Czech: 0.20%, Polish: 0.20%, Yoruba: 0.20%, Dutch: 0.10%, Vietnamese: 0.10%
Accuracy: 44.70%
Erroneously classified as Unknown: 23.80%, Turkish: 8.80%, Somali: 2.10%, Tagalog: 2.00%, Indonesian: 1.80%, Italian: 1.50%, Finnish: 1.40%, Croatian: 1.00%, French: 1.00%, Estonian: 0.90%, German: 0.90%, Lithuanian: 0.90%, Portuguese: 0.90%, Spanish: 0.90%, Afrikaans: 0.80%, English: 0.70%, Shona: 0.70%, Romanian: 0.60%, Zulu: 0.60%, Danish: 0.50%, Hungarian: 0.50%, Nynorsk: 0.50%, Swedish: 0.50%, Latvian: 0.40%, Slovene: 0.40%, Esperanto: 0.30%, Bokmal: 0.20%, Czech: 0.20%, Polish: 0.20%, Dutch: 0.10%, Vietnamese: 0.10%, Yoruba: 0.10%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 57.70%
Erroneously classified as Unknown: 18.70%, Turkish: 8.30%, Indonesian: 2.20%, Italian: 1.70%, Tagalog: 1.60%, Somali: 1.40%, Swedish: 0.90%, Estonian: 0.70%, Spanish: 0.70%, Finnish: 0.60%, German: 0.50%, Latvian: 0.50%, Lithuanian: 0.50%, Portuguese: 0.50%, Croatian: 0.40%, English: 0.40%, Slovene: 0.40%, Esperanto: 0.30%, Nynorsk: 0.30%, Romanian: 0.30%, Zulu: 0.30%, Afrikaans: 0.20%, Dutch: 0.20%, Hungarian: 0.20%, Shona: 0.20%, Bokmal: 0.10%, Czech: 0.10%, French: 0.10%

>> Detection of 1000 sentences (average length: 107 chars)
Accuracy: 91.20%
Erroneously classified as Turkish: 4.70%, Unknown: 3.20%, Italian: 0.20%, Somali: 0.20%, Croatian: 0.10%, Finnish: 0.10%, Indonesian: 0.10%, Romanian: 0.10%, Swedish: 0.10%
Accuracy: 91.30%
Erroneously classified as Turkish: 4.60%, Unknown: 3.20%, Italian: 0.20%, Somali: 0.20%, Croatian: 0.10%, Finnish: 0.10%, Indonesian: 0.10%, Romanian: 0.10%, Swedish: 0.10%

8 changes: 4 additions & 4 deletions cmd/accuracy-reports/whatlang/Bokmal.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##### Bokmal #####

>>> Accuracy on average: 34.47%
>>> Accuracy on average: 34.43%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 15.00%
Erroneously classified as Danish: 13.30%, Unknown: 12.80%, Nynorsk: 9.80%, Swedish: 6.90%, Dutch: 4.30%, German: 4.10%, Afrikaans: 3.30%, Estonian: 3.30%, French: 3.30%, Spanish: 2.30%, Esperanto: 2.20%, Italian: 2.20%, Romanian: 2.20%, Hungarian: 2.00%, Turkish: 2.00%, English: 1.60%, Portuguese: 1.50%, Indonesian: 1.40%, Croatian: 1.00%, Tagalog: 0.80%, Finnish: 0.70%, Latvian: 0.70%, Czech: 0.60%, Lithuanian: 0.50%, Polish: 0.50%, Slovene: 0.50%, Somali: 0.40%, Vietnamese: 0.30%, Zulu: 0.30%, Shona: 0.20%
Accuracy: 14.90%
Erroneously classified as Danish: 13.50%, Unknown: 12.70%, Nynorsk: 9.80%, Swedish: 6.90%, Dutch: 4.30%, German: 4.10%, Afrikaans: 3.30%, Estonian: 3.30%, French: 3.30%, Spanish: 2.30%, Esperanto: 2.20%, Italian: 2.20%, Romanian: 2.20%, Hungarian: 2.00%, Turkish: 2.00%, English: 1.60%, Portuguese: 1.50%, Indonesian: 1.40%, Croatian: 1.00%, Tagalog: 0.80%, Finnish: 0.70%, Latvian: 0.70%, Czech: 0.60%, Lithuanian: 0.50%, Polish: 0.50%, Slovene: 0.50%, Somali: 0.40%, Vietnamese: 0.30%, Zulu: 0.30%, Shona: 0.20%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 28.50%
Erroneously classified as Danish: 17.70%, Nynorsk: 16.90%, Unknown: 5.00%, Swedish: 4.90%, Afrikaans: 3.40%, French: 3.40%, Dutch: 2.70%, German: 2.30%, Estonian: 1.90%, English: 1.70%, Esperanto: 1.40%, Portuguese: 1.30%, Italian: 1.10%, Spanish: 1.10%, Turkish: 1.10%, Finnish: 0.90%, Hungarian: 0.90%, Tagalog: 0.60%, Czech: 0.50%, Romanian: 0.50%, Zulu: 0.50%, Indonesian: 0.40%, Croatian: 0.30%, Slovene: 0.30%, Latvian: 0.20%, Lithuanian: 0.20%, Polish: 0.20%, Vietnamese: 0.10%
Erroneously classified as Danish: 17.70%, Nynorsk: 16.90%, Swedish: 5.00%, Unknown: 5.00%, Afrikaans: 3.40%, French: 3.40%, Dutch: 2.60%, German: 2.30%, Estonian: 1.90%, English: 1.70%, Esperanto: 1.40%, Portuguese: 1.40%, Spanish: 1.10%, Turkish: 1.10%, Italian: 1.00%, Finnish: 0.90%, Hungarian: 0.90%, Tagalog: 0.60%, Czech: 0.50%, Romanian: 0.50%, Zulu: 0.50%, Indonesian: 0.40%, Croatian: 0.30%, Slovene: 0.30%, Latvian: 0.20%, Lithuanian: 0.20%, Polish: 0.20%, Vietnamese: 0.10%

>> Detection of 1000 sentences (average length: 98 chars)
Accuracy: 59.90%
Expand Down
2 changes: 1 addition & 1 deletion cmd/accuracy-reports/whatlang/Bulgarian.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 36.80%
Erroneously classified as Macedonian: 20.30%, Russian: 12.70%, Serbian: 8.40%, Unknown: 8.20%, Ukrainian: 7.30%, Belarusian: 4.10%, Azerbaijani: 2.20%
Erroneously classified as Macedonian: 20.30%, Russian: 12.60%, Serbian: 8.40%, Unknown: 8.20%, Ukrainian: 7.40%, Belarusian: 4.10%, Azerbaijani: 2.20%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 56.90%
Expand Down
6 changes: 3 additions & 3 deletions cmd/accuracy-reports/whatlang/Croatian.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
##### Croatian #####

>>> Accuracy on average: 54.57%
>>> Accuracy on average: 54.60%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 28.30%
Erroneously classified as Unknown: 19.00%, Slovene: 13.70%, Czech: 3.70%, Romanian: 2.70%, Esperanto: 2.60%, Estonian: 2.40%, Lithuanian: 2.00%, Nynorsk: 1.80%, Polish: 1.80%, Portuguese: 1.80%, Swedish: 1.70%, Zulu: 1.70%, Spanish: 1.60%, Tagalog: 1.40%, Afrikaans: 1.30%, Bokmal: 1.30%, Dutch: 1.20%, Turkish: 1.10%, Italian: 1.00%, Latvian: 1.00%, Shona: 1.00%, Danish: 0.90%, English: 0.90%, Finnish: 0.90%, Indonesian: 0.80%, German: 0.70%, French: 0.60%, Hungarian: 0.50%, Somali: 0.40%, Azerbaijani: 0.10%, Vietnamese: 0.10%
Accuracy: 28.40%
Erroneously classified as Unknown: 19.10%, Slovene: 13.70%, Czech: 3.60%, Romanian: 2.70%, Esperanto: 2.60%, Estonian: 2.40%, Lithuanian: 2.00%, Nynorsk: 1.80%, Polish: 1.80%, Portuguese: 1.80%, Swedish: 1.70%, Zulu: 1.70%, Spanish: 1.60%, Afrikaans: 1.30%, Bokmal: 1.30%, Tagalog: 1.30%, Dutch: 1.20%, Turkish: 1.10%, Italian: 1.00%, Latvian: 1.00%, Shona: 1.00%, Danish: 0.90%, English: 0.90%, Finnish: 0.90%, Indonesian: 0.80%, German: 0.70%, French: 0.60%, Hungarian: 0.50%, Somali: 0.40%, Azerbaijani: 0.10%, Vietnamese: 0.10%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 44.00%
Expand Down
8 changes: 4 additions & 4 deletions cmd/accuracy-reports/whatlang/Czech.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##### Czech #####

>>> Accuracy on average: 49.57%
>>> Accuracy on average: 49.53%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 31.40%
Erroneously classified as Unknown: 17.00%, Croatian: 7.30%, Slovene: 5.70%, Polish: 3.70%, Esperanto: 3.40%, Romanian: 2.80%, English: 2.60%, German: 2.00%, Portuguese: 2.00%, French: 1.90%, Shona: 1.80%, Zulu: 1.80%, Estonian: 1.70%, Nynorsk: 1.40%, Spanish: 1.40%, Italian: 1.20%, Afrikaans: 1.10%, Somali: 1.00%, Turkish: 1.00%, Hungarian: 0.90%, Lithuanian: 0.90%, Tagalog: 0.90%, Indonesian: 0.80%, Swedish: 0.80%, Bokmal: 0.70%, Finnish: 0.70%, Latvian: 0.60%, Yoruba: 0.50%, Danish: 0.40%, Dutch: 0.40%, Vietnamese: 0.20%
Erroneously classified as Unknown: 17.00%, Croatian: 7.30%, Slovene: 5.60%, Polish: 3.70%, Esperanto: 3.40%, Romanian: 2.80%, English: 2.60%, German: 2.00%, Portuguese: 2.00%, French: 1.90%, Shona: 1.80%, Zulu: 1.80%, Estonian: 1.70%, Nynorsk: 1.40%, Spanish: 1.40%, Italian: 1.20%, Afrikaans: 1.10%, Somali: 1.00%, Turkish: 1.00%, Hungarian: 0.90%, Lithuanian: 0.90%, Tagalog: 0.90%, Indonesian: 0.80%, Swedish: 0.80%, Bokmal: 0.70%, Finnish: 0.70%, Latvian: 0.60%, Yoruba: 0.60%, Danish: 0.40%, Dutch: 0.40%, Vietnamese: 0.20%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 46.30%
Erroneously classified as Unknown: 9.10%, Croatian: 8.70%, Slovene: 5.50%, Polish: 2.90%, Esperanto: 2.80%, Portuguese: 2.40%, Spanish: 2.20%, German: 2.10%, Romanian: 1.90%, French: 1.50%, Estonian: 1.40%, Tagalog: 1.30%, Danish: 1.20%, Dutch: 1.20%, Italian: 1.20%, Hungarian: 1.10%, English: 1.00%, Afrikaans: 0.80%, Bokmal: 0.80%, Latvian: 0.80%, Zulu: 0.80%, Indonesian: 0.70%, Finnish: 0.40%, Nynorsk: 0.40%, Shona: 0.40%, Lithuanian: 0.30%, Swedish: 0.30%, Somali: 0.20%, Turkish: 0.20%, Yoruba: 0.10%
Accuracy: 46.20%
Erroneously classified as Unknown: 9.10%, Croatian: 8.70%, Slovene: 5.50%, Polish: 2.90%, Esperanto: 2.80%, Portuguese: 2.40%, Spanish: 2.20%, German: 2.10%, Romanian: 1.90%, French: 1.50%, Estonian: 1.40%, Italian: 1.30%, Tagalog: 1.30%, Danish: 1.20%, Dutch: 1.20%, Hungarian: 1.10%, English: 1.00%, Afrikaans: 0.80%, Bokmal: 0.80%, Latvian: 0.80%, Zulu: 0.80%, Indonesian: 0.70%, Finnish: 0.40%, Nynorsk: 0.40%, Shona: 0.40%, Lithuanian: 0.30%, Swedish: 0.30%, Somali: 0.20%, Turkish: 0.20%, Yoruba: 0.10%

>> Detection of 1000 sentences (average length: 93 chars)
Accuracy: 71.00%
Expand Down
6 changes: 3 additions & 3 deletions cmd/accuracy-reports/whatlang/Danish.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
##### Danish #####

>>> Accuracy on average: 46.80%
>>> Accuracy on average: 46.87%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 23.80%
Erroneously classified as Unknown: 13.10%, Bokmal: 9.60%, Nynorsk: 6.10%, Swedish: 5.30%, Dutch: 5.20%, German: 4.00%, French: 3.90%, Estonian: 3.40%, Afrikaans: 3.20%, English: 2.80%, Turkish: 2.40%, Spanish: 2.30%, Hungarian: 2.10%, Italian: 1.80%, Esperanto: 1.30%, Slovene: 1.30%, Romanian: 1.10%, Czech: 1.00%, Lithuanian: 0.90%, Portuguese: 0.90%, Croatian: 0.80%, Indonesian: 0.70%, Latvian: 0.60%, Zulu: 0.60%, Finnish: 0.50%, Shona: 0.40%, Somali: 0.30%, Tagalog: 0.30%, Vietnamese: 0.20%, Polish: 0.10%
Accuracy: 24.00%
Erroneously classified as Unknown: 13.00%, Bokmal: 9.50%, Nynorsk: 6.20%, Swedish: 5.30%, Dutch: 5.20%, German: 4.00%, French: 3.90%, Estonian: 3.40%, Afrikaans: 3.20%, English: 2.80%, Turkish: 2.40%, Spanish: 2.30%, Hungarian: 2.10%, Italian: 1.80%, Esperanto: 1.30%, Slovene: 1.30%, Czech: 1.00%, Romanian: 1.00%, Lithuanian: 0.90%, Portuguese: 0.90%, Croatian: 0.80%, Indonesian: 0.70%, Latvian: 0.60%, Zulu: 0.60%, Finnish: 0.50%, Shona: 0.40%, Somali: 0.30%, Tagalog: 0.30%, Vietnamese: 0.20%, Polish: 0.10%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 37.70%
Expand Down
4 changes: 2 additions & 2 deletions cmd/accuracy-reports/whatlang/Dutch.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 22.40%
Erroneously classified as Unknown: 14.60%, German: 10.00%, Afrikaans: 9.60%, Danish: 3.70%, French: 3.70%, Estonian: 3.60%, English: 3.40%, Bokmal: 3.30%, Spanish: 3.00%, Finnish: 2.40%, Nynorsk: 2.40%, Swedish: 2.10%, Indonesian: 1.40%, Romanian: 1.40%, Hungarian: 1.30%, Portuguese: 1.30%, Slovene: 1.20%, Lithuanian: 1.10%, Turkish: 1.10%, Zulu: 1.10%, Italian: 1.00%, Polish: 0.90%, Esperanto: 0.80%, Czech: 0.70%, Latvian: 0.70%, Somali: 0.50%, Tagalog: 0.50%, Shona: 0.30%, Vietnamese: 0.30%, Croatian: 0.20%
Erroneously classified as Unknown: 14.60%, German: 10.00%, Afrikaans: 9.50%, Danish: 3.80%, French: 3.80%, Estonian: 3.60%, English: 3.40%, Bokmal: 3.30%, Spanish: 2.90%, Finnish: 2.40%, Nynorsk: 2.40%, Swedish: 2.00%, Indonesian: 1.40%, Romanian: 1.40%, Hungarian: 1.30%, Portuguese: 1.30%, Slovene: 1.20%, Lithuanian: 1.10%, Turkish: 1.10%, Zulu: 1.10%, Italian: 1.00%, Polish: 0.90%, Esperanto: 0.80%, Latvian: 0.80%, Czech: 0.70%, Somali: 0.50%, Tagalog: 0.50%, Shona: 0.30%, Vietnamese: 0.30%, Croatian: 0.20%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 35.70%
Erroneously classified as German: 13.00%, Afrikaans: 12.90%, Unknown: 7.00%, Danish: 3.90%, Bokmal: 3.50%, French: 3.40%, English: 3.10%, Spanish: 2.20%, Nynorsk: 2.10%, Swedish: 2.10%, Estonian: 1.60%, Romanian: 1.40%, Finnish: 1.30%, Italian: 0.90%, Indonesian: 0.80%, Portuguese: 0.80%, Turkish: 0.70%, Somali: 0.60%, Czech: 0.30%, Esperanto: 0.30%, Hungarian: 0.30%, Latvian: 0.30%, Polish: 0.30%, Tagalog: 0.30%, Croatian: 0.20%, Lithuanian: 0.20%, Shona: 0.20%, Slovene: 0.20%, Vietnamese: 0.20%, Zulu: 0.20%
Erroneously classified as Afrikaans: 12.90%, German: 12.90%, Unknown: 7.00%, Danish: 4.00%, Bokmal: 3.50%, French: 3.40%, English: 3.10%, Spanish: 2.20%, Nynorsk: 2.10%, Swedish: 2.10%, Estonian: 1.60%, Romanian: 1.40%, Finnish: 1.30%, Italian: 0.90%, Indonesian: 0.80%, Portuguese: 0.80%, Turkish: 0.70%, Somali: 0.60%, Czech: 0.30%, Esperanto: 0.30%, Hungarian: 0.30%, Latvian: 0.30%, Polish: 0.30%, Tagalog: 0.30%, Croatian: 0.20%, Lithuanian: 0.20%, Shona: 0.20%, Slovene: 0.20%, Vietnamese: 0.20%, Zulu: 0.20%

>> Detection of 1000 sentences (average length: 107 chars)
Accuracy: 82.50%
Expand Down
6 changes: 3 additions & 3 deletions cmd/accuracy-reports/whatlang/English.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
##### English #####

>>> Accuracy on average: 49.00%
>>> Accuracy on average: 49.03%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 17.40%
Erroneously classified as Unknown: 17.30%, French: 14.30%, Romanian: 5.00%, Danish: 4.20%, Estonian: 3.90%, German: 3.80%, Portuguese: 3.30%, Italian: 3.00%, Spanish: 3.00%, Dutch: 2.90%, Bokmal: 2.50%, Nynorsk: 2.20%, Swedish: 2.10%, Afrikaans: 1.90%, Esperanto: 1.30%, Latvian: 1.30%, Hungarian: 1.20%, Lithuanian: 1.10%, Slovene: 1.00%, Turkish: 1.00%, Vietnamese: 1.00%, Polish: 0.90%, Tagalog: 0.90%, Zulu: 0.90%, Indonesian: 0.80%, Finnish: 0.50%, Somali: 0.50%, Croatian: 0.30%, Czech: 0.30%, Shona: 0.20%
Accuracy: 17.50%
Erroneously classified as Unknown: 17.30%, French: 14.30%, Romanian: 5.00%, Danish: 4.20%, Estonian: 3.90%, German: 3.80%, Portuguese: 3.40%, Italian: 3.00%, Spanish: 2.90%, Dutch: 2.80%, Bokmal: 2.50%, Nynorsk: 2.20%, Swedish: 2.10%, Afrikaans: 1.90%, Esperanto: 1.30%, Latvian: 1.30%, Hungarian: 1.20%, Lithuanian: 1.10%, Slovene: 1.00%, Turkish: 1.00%, Vietnamese: 1.00%, Polish: 0.90%, Tagalog: 0.90%, Zulu: 0.90%, Indonesian: 0.80%, Finnish: 0.50%, Somali: 0.50%, Croatian: 0.30%, Czech: 0.30%, Shona: 0.20%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 35.40%
Expand Down
Loading

0 comments on commit 60a0ebc

Please sign in to comment.