-
Notifications
You must be signed in to change notification settings - Fork 6
Home
gignu edited this page Mar 23, 2021
·
43 revisions
This wiki is supposed to you a short overview of how things work under the hood. It is not a detailed description but rather an approximation and simplification with no aim of being 100% accurate.
To detect the language, files are read either in UTF-8 or ISO-8859-1 depending on whether UTF-8 has been detected previously. Files are scanned for specific words that are unique to only one language. Each language has one to three of those words. When a file contains the word "the" for example it is a strong indication that the language of the file we're dealing with is English.
Lorem Ipsum