NLLB Gronings Tatoeba Translation Machine

This project focuses on building a translation machine for the Gronings language using the Tatoeba dataset and the NLLB (No Language Left Behind) model.

I'm working on this in my free time so the project does not always have steady progress.

A first version of the translation model is now on huggingface! https://huggingface.co/Tom9358/nllb-tatoeba-gos-nld-v1

Here is a Huggingface space where the model can be used: https://huggingface.co/spaces/Tom9358/gos_gronings_translate

(Here is also an equivalent google colab: https://colab.research.google.com/drive/1b5dn3VT4fvOBKly1CIx4Qwo59GDM1H-M)

Training

It was trained on about 10.000 Gronings-Dutch sentence pairs from Tatoeba, about half of which I wrote myself.

I tried my best to check for naturalness and spelling using the Gronings online dictionary and corpus Woordwaark, and the Gronings-language website dideldom.nu. Particularly the Kreuze Gronings magazines hosted there I found very useful, and I wrote a little search interface to easily find example sentences in those magazines. I never copied any sentences and instead always formulated analogous ones myself.

Thanks

A heartfelt thanks to the authors in Kreuze, to the team behind Woordwaark, and to the hoster of dideldom! Without you, I would have been nowhere.

Special thanks to CmdCody for the very similar and very inspirational project for North Frisian, and for the link to a useful blogpost.

Thanks to the nice blogpost How to Fine-Tune a NLLB-200 Model for Translating a New Language for helping me get started and helping with some parts of the code.

Thanks to Tatoeba for including Gronings as one of the languages on their site, for letting me add and correct sentences there in many languages (I've written hundreds of English, German and Spanish translation equivalents of Gronings sentences as well!), and for letting me download this data as a parallel corpus dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
notebooks		notebooks
sidetracks		sidetracks
src/nllb_try		src/nllb_try
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml.old		environment.yaml.old
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLLB Gronings Tatoeba Translation Machine

Training

Thanks

About

Uh oh!

Releases

Packages

Languages

License

tom9358/nllb-tryout

Folders and files

Latest commit

History

Repository files navigation

NLLB Gronings Tatoeba Translation Machine

Training

Thanks

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages