Skip to content
gignu edited this page Mar 23, 2021 · 43 revisions

How it works

This wiki is supposed to you a short overview of how things work under the hood. It is not a detailed description but rather an approximation and simplification with no aim of being 100% accurate.

Language Detection

To detect the language, files are read either in UTF-8 or ISO-8859-1 depending on whether UTF-8 has been detected previously. Files are scanned for specific words that are unique to only one language. Each language has one to three of those words. When a file contains the word "the" for example it is a strong indication that the language of the file we're dealing with is English.

Confidence Score

Lorem Ipsum

Clone this wiki locally