GitHub - martholomew/make-dic

Basic Usage:

cat together a corpus into a singular file (ex., copcorp.txt).
Edit freq.sh's first line to contain the unicode characters that you need (unicode-table.com is good for this).
Run freq.sh, this will take a while and use a lot of cpu.
Make sure there aren't any errors, since it takes a while freq.sh generates a file at each step for error checking.
Run parser.pl on the last output of freq.sh, this will normalize the frequency numbers to get them ready for ASK.

parser.pl taken from this repository licensed with Apache 2.0!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
copcorp.txt		copcorp.txt
freq.sh		freq.sh
parser.pl		parser.pl