Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.
Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.
We provide four recipes at present:
This is the simplest ASR recipe in icefall
and can be run on CPU.
Training takes less than 30 seconds and gives you the following WER:
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
We do provide a Colab notebook for this recipe.
For the VSR (visual speech recognition) task, we provide two models: Conv3d Map BiGRU CTC model and Conv3d ResNet18 BiGRU CTC model.
The WER for this model is:
TEST | |
---|---|
WER | 15.68% |
We provide a Colab notebook to run a pre-trained Conv3d Map BiGRU CTC model:
The WER for this model is:
TEST | |
---|---|
WER | 13.63% |
We provide a Colab notebook to run a pre-trained Conv3d ResNet18 BiGRU CTC model:
For the ASR (automatic speech recognition) task, we provide one model: Tdnn Lstm CTC model.
The WER for this model is:
TEST | |
---|---|
WER | 2.35% |
We provide a Colab notebook to run a pre-trained Tdnn Lstm CTC model:
For the AVSR (audio-visual speech recognition) task, we provide one model: CombineNet CTC model.
The WER for this model is:
TEST | |
---|---|
WER | 1.71% |
We provide a Colab notebook to run a pre-trained CombineNet CTC model:
Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.
Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html#deployment-with-c for how to do this.
We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: