UETASR provides various useful tools to speed up and facilitate research on speech technologies:
-
A YAML-based hyperparameter specification language that describes all types of hyperparameters, from individual numbers to complete objects.
-
Single and Multi-GPUs training and inference with TensorFlow 2 Data-Parallel or Distributed Data-Parallel.
-
A transparent and entirely customizable data input and output pipeline, enabling users to customize the I/O pipelines.
-
Logging and visualization with WandB and TensorBoard.
-
Error analysis tools to help users debug their models.
- CTC and Transducer architectures with any encoders (and decoders) can be plugged into the framework.
- Gradient Accumulation for large batch training is supported.
- Currently supported:
- Conformer (https://arxiv.org/abs/2005.08100)
- Emformer (https://arxiv.org/abs/2010.10759)
UETASR provides efficient and GPU-friendly on-the-fly speech augmentation pipelines and acoustic feature extraction:
- Augmentation:
- Featurization:
- MFCC, Fbank, Spectrogram, etc.
- Subword tokenization (BPE, Unigram, etc.)
For training and testing, you can use git clone
to install some optional packages from other authors (ctc_loss
, rnnt_loss
, etc.)
-
TensorFlow >= 2.9.0
-
CuDNN >= 8.1.0
-
CUDA >= 11.2
-
Nvidia driver >= 470
Once you have created your Python environment (Python 3.6+) you can simply type:
git clone https://github.com/thanhtvt/uetasr.git
cd uetasr
pip install -e .
Then you can access uetasr with:
import uetasr
git clone https://github.com/thanhtvt/uetasr.git
conda create --name uetasr python=3.8
conda activate uetasr
conda install cudnn=8.1.0
cd uetasr
pip install -e .
Build docker from Dockerfile:
docker build -t uetasr:v1.0.0 .
Run container from uetasr
image:
docker run -it --name uetasr --gpus all -v <workspace_dir>:/workspace uetasr:v1.0.0 bash
- Define config YAML file, see the
config.yaml
file this folder for reference. - Download your corpus and create a script to generate the .tsv file (see this file for reference). Check our provided
tools
whether they meet your need. - Create
transcript.txt
andcmvn.tsv
files for your corpus. We implement this script to generate those files, knowing the .tsv file generated in step 2. - For training, check
train.py
in theegs
folder to see the options. - For testing, check
test.py
in theegs
folder to see the options. - For evaluating and error analysis, check
asr_evaluation.py
in thetools
folder to see the options. - [Optional] To publish your model on 🤗, check this space for reference.
- namnv1906 (for the guidance & initial version of this toolkit)
- TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2
- ESPNet: End-to-End Speech Processing Toolkit
- SpeechBrain: A PyTorch-based Speech Toolkit
- Python module for evaluting ASR hypotheses
- Accumulated Gradients for TensorFlow 2