This repo contains instructions and scripts to train acoustic models using Kaldi over the datasets in Brazilian Portuguese (or just "general Portuguese"). You may also find some scripts for forced alignment and speaker diarization.
🗣️ Looking for speech datasets in Brazilian Portuguese? Check out our "Speech Datasets" GitHub repo (based on DVC for storage): https://github.com/falabrasil/speech-datasets
📝 Looking for text datasets in Brazilian Portuguese? Check out our "Text Datasets" GitHub repo: https://github.com/falabrasil/text-datasets
🎙️ 🦊 Looking for acoustic models (AM, probably for Vosk)? Check out the following GitLab repo (with LFS storage): https://gitlab.com/fb-resources/kaldi-br
🗒️ 🦊 Looking for language models (LM)? Check out the following GitHub repo (notice there's a pair repo on GitLab for LFS storage): https://github.com/falabrasil/lm-br
📰 🦊 Looking for phonetic dictionaries (lexicon)? Check out the following GitHub repo (notice there's a pair repo on GitLab for LFS storage): https://github.com/falabrasil/dicts-br
🏷️ 🐳 Wanna create your own phonetic dictionary? Check out our annotator tool's GitHub repo (there's also a dockerized version): https://github.com/falabrasil/annotator
☕ Looking for Kaldi installation instructions? Check out our install
guide on INSTALL.md
file or just go follow Kaldi documentation
directly: https://github.com/kaldi-asr/kaldi
👣 If you're looking for a tutorial on data preparation and a step-by-step guide on how to train your own acoustic models from scratch using Kaldi, the best we can offer is this written tutorial.
See fb-lapsbm/
dir.
Based on Mini-librispeech nnet3
recipe (local/chain/tuning/run_tdnn_1j.sh
),
adapted for a quick train exec over LapsBenchmark.
$ ./prep_lapsbm.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh
For online decoding, please check
fb-lapsbm/local/vosk/
dir.
See fb-falabrasil/
dir.
This is expected to become the main recipe for Brazilian Portuguese, as we are
planning on releasing the acoustic models as well.
Also based on Mini-librispeech recipe, same as above, but now it runs over all public speech datasets in Portugese (NOTE: not only "Brazilian" Portuguese!) we are aware of, which have been gathered here: https://github.com/falabrasil/speech-datasets
$ ./prep_falabrasil.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh
For online decoding, please check
fb-falabrasil/local/vosk/
dir.
See fb-gentle/
dir.
Based on ASpIRE nnet3
recipe.
$ ./prep_gentle.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run.sh
See fb-ufpalign/
dir.
Based on LibriSpeech nnet3
recipe, in the hopes of future compatibility with
MFA.
$ ./prep_ufpalign.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/s5/
$ ./run_all.sh
See fb-callhome/
dir.
Based on CALLHOME v2 recipe. This uses pre-trained models on English data for
inference only rather than training one from scratch.
$ ./prep_callhome.sh /path/to/kaldi/egs/myproject
$ cd /path/to/kaldi/egs/myproject/v2/
$ ./run.sh
Standalone clustering procedure based on pyannote.audio
lib can also be
found under utils/clustering/
_diarization dir.
If you use these codes or want to mention the paper referred above, please cite us as one of the following:
Batista, C., Dias, A.L., Sampaio Neto, N. (2018) Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools. Proc. IberSPEECH 2018, 77-81, DOI: 10.21437/IberSPEECH.2018-17.
@inproceedings{Batista18,
author = {Cassio Batista and Ana Larissa Dias and Nelson {Sampaio Neto}},
title = {{Baseline Acoustic Models for Brazilian Portuguese Using Kaldi Tools}},
year = {2018},
booktitle = {Proc. IberSPEECH 2018},
pages = {77--81},
doi = {10.21437/IberSPEECH.2018-17},
url = {http://dx.doi.org/10.21437/IberSPEECH.2018-17}
}
nnet2
. Try running git tag
.
Dias A.L., Batista C., Santana D., Neto N. (2020) Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools. In: Cerri R., Prati R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science, vol 12319. Springer, Cham. https://doi.org/10.1007/978-3-030-61377-8_44
@inproceedings{Dias20,
author = {Dias, Ana Larissa and Batista, Cassio and Santana, Daniel and Neto, Nelson},
editor = {Cerri, Ricardo and Prati, Ronaldo C.},
title = {Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools},
booktitle = {Intelligent Systems},
year = {2020},
publisher = {Springer International Publishing},
address = {Cham},
pages = {621--635},
isbn = {978-3-030-61377-8}
}
Batista, C., Dias, A.L. & Neto, N. Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit. EURASIP J. Adv. Signal Process. 2022, 11 (2022). https://doi.org/10.1186/s13634-022-00844-9
@article{Batista22,
author = {Batista, Cassio and Dias, Ana Larissa and Neto, Nelson},
title = {Free resources for forced phonetic alignment in Brazilian Portuguese based on Kaldi toolkit},
journal = {EURASIP Journal on Advances in Signal Processing},
year = {2022},
month = {Feb},
day = {19},
volume = {2022},
number = {1},
pages = {11},
issn = {1687-6180},
doi = {10.1186/s13634-022-00844-9},
url = {https://doi.org/10.1186/s13634-022-00844-9}
}
Grupo FalaBrasil (2022) - https://ufpafalabrasil.gitlab.io/
Universidade Federal do Pará (UFPA) - https://portal.ufpa.br/
Cassio Batista - https://cassota.gitlab.io/