interscript-khmer

Approaches

Name	Description	CER	Accuracy (%)
Seq2Seq Transformers	This is simple transformers based seq2seq approach that was proposed in the "Attention is all you need" paper.	0.3125	50.12
Seq2Seq Transformers + Dictionary Lookup	The same as the first approach, but adding Dictionary lookup. Dictionary lookup is an approach of spell correction when we looking for similar (Levenstein distance = 2) words in the Khmer → Transcription dictionary.	0.4174	23.18
Seq2Seq Transformers + SymSpell	The same as the first approach, but adding SymSpell approach for spell correction. The approach is described in this repository.	-	-
Simple mapping	Instead of ML, we use Khmer char to Latin transcriptions dictionary. It was collected via Google Translate and Google Input Tools. Before mapping Khmer to Latin we segment sentences to words via the khmer-nltk module.	0.42	11.00
Seq2Seq Transformers based on mapping data	The same as the first approach, but it's trained on mapping data (described above). So, first of all, we convert Khmer to Latin via the approach above and then the seq2seq transformer is trained.	0.3387	38.85

Also, I've made experiments on different Spell correction approaches, but they haven't given any results. For approaches evaluation, synthetic data was created. Synthetic data is data that was created from transcriptions transforms. Approaches that were checked: Seq2Seq Transformers, SymSpell, Dictionary Lookup.

How to train the model?

python python/train.py -i examples/khm-latn/input.csv -t examples/khm-latn/target.csv -en NAME_OF_THE_MODEL

This command will run model fitting on default params. If you need custom parameters then check secryst repo.

The best model in models/no-punct-no-bug/.

How to inference Seq2Seq Transformers model?

python python/inference.py -i KHMER_TEXT -en NAME_OF_THE_MODEL

If you want to inference the best model, then run:

python python/inference.py -i KHMER_TEXT -en no-punct-no-bug

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
examples		examples
models		models
python		python
.gitignore		.gitignore
LICENSE.adoc		LICENSE.adoc
README.md		README.md
dummy_submission.csv		dummy_submission.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

interscript-khmer

Approaches

How to train the model?

How to inference Seq2Seq Transformers model?

About

Releases

Packages

Languages

License

interscript/khmer-diacritics

Folders and files

Latest commit

History

Repository files navigation

interscript-khmer

Approaches

How to train the model?

How to inference Seq2Seq Transformers model?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages