Bengali-ASR provides scripts and notebooks for building an automatic speech recognition (ASR) system for the Bengali language. The project uses the Whisper architecture with JAX/TPU as well as PyTorch implementations based on Wav2Vec2.
setup.sh– installs Python 3.11 and all required packages (JAX, Transformers, PyTorch, etc.).download.sh– downloads training data from Kaggle and other public sources.functions.py,functions_infer.py– dataset utilities and dataloaders.run_train.py– main JAX training script for the Whisper model.run_train_txt.py– optional text-only training of the decoder.model_wav2vec_CTC.py– PyTorch approach using Wav2Vec2 with a CTC head.*.ipynb– Jupyter notebooks with experiments and evaluations.
-
Install dependencies
bash setup.sh
-
Download data
bash download.sh
Place your
kaggle.jsoncredentials in the project root before running the script. -
Activate the virtual environment
source ~/.venv311/bin/activate
Edit hyperparameters in run_train.py as necessary, then execute:
python run_train.pyFor additional training using only text data, run:
python run_train_txt.pyThe file model_wav2vec_CTC.py contains a PyTorch implementation with a CTC loss. Run it directly after adjusting paths:
python model_wav2vec_CTC.pyfunctions_infer.py shows how to create an inference dataset and collate function. See the notebooks for end-to-end examples.
Open the Jupyter notebooks for exploration and evaluation:
jupyter labThis repository is provided for research purposes. Please review the licenses of the datasets used before redistribution.