Deep Speech model is one of the ASR that got the SOTA in Speech Recognition domain. In this respository, I use Deep Speech with Vivos Dataset and Vin BigData VLSP 2020 Dataset.
- Clone this project to current directory. Using those commands:
!git init
!git remote add origin https://github.com/tuanio/deepspeech-ctc
!git pull origin main
- Install requirement packages
!pip install -r requirements.txt
Then install ctcdecode
from this respository: https://github.com/parlance/ctcdecode
- Edit
configs.yaml
file for appropriation. - Train model using
python main.py -cp conf -cn configs
streamlit run web.py
Train loss of Deep Speech on 978 epochs
Validation loss of Deep Speech
Validation word error rate (mean wer) of Deep Speech
sox
is audio backend for linux,PySoundFile
is audio backend for windows
HYDRA_FULL_ERROR=1