FalaBrasil Scripts for Kaldi 🇧🇷

This repo contains instructions and scripts to train acoustic models using Kaldi over the datasets in Brazilian Portuguese (or just "general Portuguese"). You may also find some scripts for forced alignment and speaker diarization.

🗣️ Looking for speech datasets in Brazilian Portuguese? Check out our "Speech Datasets" GitHub repo (based on DVC for storage): https://github.com/falabrasil/speech-datasets

📝 Looking for text datasets in Brazilian Portuguese? Check out our "Text Datasets" GitHub repo: https://github.com/falabrasil/text-datasets

🎙️ 🦊 Looking for acoustic models (AM, probably for Vosk)? Check out the following GitLab repo (with LFS storage): https://gitlab.com/fb-resources/kaldi-br

🗒️ 🦊 Looking for language models (LM)? Check out the following GitHub repo (notice there's a pair repo on GitLab for LFS storage): https://github.com/falabrasil/lm-br

📰 🦊 Looking for phonetic dictionaries (lexicon)? Check out the following GitHub repo (notice there's a pair repo on GitLab for LFS storage): https://github.com/falabrasil/dicts-br

🏷️ 🐳 Wanna create your own phonetic dictionary? Check out our annotator tool's GitHub repo (there's also a dockerized version): https://github.com/falabrasil/annotator

☕ Looking for Kaldi installation instructions? Check out our install guide on INSTALL.md file or just go follow Kaldi documentation directly: https://github.com/kaldi-asr/kaldi

👣 If you're looking for a tutorial on data preparation and a step-by-step guide on how to train your own acoustic models from scratch using Kaldi, the best we can offer is this written tutorial.

Model training for speech recognition (Vosk + LapsBM)

See fb-lapsbm/ dir. Based on Mini-librispeech nnet3 recipe (local/chain/tuning/run_tdnn_1j.sh), adapted for a quick train exec over LapsBenchmark.

$ ./prep_lapsbm.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh

For online decoding, please check fb-lapsbm/local/vosk/ dir.

Model training for speech recognition (Vosk + Datasets)

See fb-falabrasil/ dir. This is expected to become the main recipe for Brazilian Portuguese, as we are planning on releasing the acoustic models as well.

Also based on Mini-librispeech recipe, same as above, but now it runs over all public speech datasets in Portugese (NOTE: not only "Brazilian" Portuguese!) we are aware of, which have been gathered here: https://github.com/falabrasil/speech-datasets

$ ./prep_falabrasil.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh

For online decoding, please check fb-falabrasil/local/vosk/ dir.

Model training for phonetic alignment (Gentle)

See fb-gentle/ dir. Based on ASpIRE nnet3 recipe.

$ ./prep_gentle.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh

⚠️ it didn't work. See README inside.

Model training for phonetic alignment (UFPAlign)

See fb-ufpalign/ dir. Based on LibriSpeech nnet3 recipe, in the hopes of future compatibility with MFA.

$ ./prep_ufpalign.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run_all.sh

Speaker diarization (CallHome)

See fb-callhome/ dir. Based on CALLHOME v2 recipe. This uses pre-trained models on English data for inference only rather than training one from scratch.

$ ./prep_callhome.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/v2/
$ ./run.sh

Standalone clustering procedure based on pyannote.audio lib can also be found under utils/clustering/_diarization dir.

Citation

If you use these codes or want to mention the paper referred above, please cite us as one of the following:

IberSPEECH 2018

Batista, C., Dias, A.L., Sampaio Neto, N. (2018) Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools. Proc. IberSPEECH 2018, 77-81, DOI: 10.21437/IberSPEECH.2018-17.

@inproceedings{Batista18,
  author     = {Cassio Batista and Ana Larissa Dias and Nelson {Sampaio Neto}},
  title      = {{Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools}},
  year       = {2018},
  booktitle  = {Proc. IberSPEECH 2018},
  pages      = {77--81},
  doi        = {10.21437/IberSPEECH.2018-17},
  url        = {http://dx.doi.org/10.21437/IberSPEECH.2018-17}
}

⚠️ This paper uses the outdated nnet2 recipes, while this repo has been updated to the chain models' recipe via nnet3 scripts. If you really want nnet2 scripts, you may find them on tag nnet2. Try running git tag.

BRACIS 2020

Dias A.L., Batista C., Santana D., Neto N. (2020) Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools. In: Cerri R., Prati R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science, vol 12319. Springer, Cham. https://doi.org/10.1007/978-3-030-61377-8_44

@inproceedings{Dias20,
  author     = {Dias, Ana Larissa and Batista, Cassio and Santana, Daniel and Neto, Nelson},
  editor     = {Cerri, Ricardo and Prati, Ronaldo C.},
  title      = {Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools},
  booktitle  = {Intelligent Systems},
  year       = {2020},
  publisher  = {Springer International Publishing},
  address    = {Cham},
  pages      = {621--635},
  isbn       = {978-3-030-61377-8}
}

EURASIP 2022

Batista, C., Dias, A.L. & Neto, N. Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit. EURASIP J. Adv. Signal Process. 2022, 11 (2022). https://doi.org/10.1186/s13634-022-00844-9

@article{Batista22,
  author     = {Batista, Cassio and Dias, Ana Larissa and Neto, Nelson},
  title      = {Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit},
  journal    = {EURASIP Journal on Advances in Signal Processing},
  year       = {2022},
  month      = {Feb},
  day        = {19},
  volume     = {2022},
  number     = {1},
  pages      = {11},
  issn       = {1687-6180},
  doi        = {10.1186/s13634-022-00844-9},
  url        = {https://doi.org/10.1186/s13634-022-00844-9}
}

Grupo FalaBrasil (2022) - https://ufpafalabrasil.gitlab.io/
Universidade Federal do Pará (UFPA) - https://portal.ufpa.br/
Cassio Batista - https://cassota.gitlab.io/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FalaBrasil Scripts for Kaldi 🇧🇷

Model training for speech recognition (Vosk + LapsBM)

Model training for speech recognition (Vosk + Datasets)

Model training for phonetic alignment (Gentle)

Model training for phonetic alignment (UFPAlign)

Speaker diarization (CallHome)

Citation

IberSPEECH 2018

BRACIS 2020

EURASIP 2022

About

Releases

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 270 Commits
doc		doc
fb-callhome		fb-callhome
fb-falabrasil		fb-falabrasil
fb-gentle		fb-gentle
fb-lapsbm		fb-lapsbm
fb-ufpalign		fb-ufpalign
utils		utils
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
TUTORIAL.md		TUTORIAL.md
prep_callhome.sh		prep_callhome.sh
prep_falabrasil.sh		prep_falabrasil.sh
prep_gentle.sh		prep_gentle.sh
prep_lapsbm.sh		prep_lapsbm.sh
prep_ufpalign.sh		prep_ufpalign.sh

License

falabrasil/kaldi-br

Folders and files

Latest commit

History

Repository files navigation

FalaBrasil Scripts for Kaldi 🇧🇷

Model training for speech recognition (Vosk + LapsBM)

Model training for speech recognition (Vosk + Datasets)

Model training for phonetic alignment (Gentle)

Model training for phonetic alignment (UFPAlign)

Speaker diarization (CallHome)

Citation

IberSPEECH 2018

BRACIS 2020

EURASIP 2022

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Contributors 2

Languages