Bengali-ASR

Bengali-ASR provides scripts and notebooks for building an automatic speech recognition (ASR) system for the Bengali language. The project uses the Whisper architecture with JAX/TPU as well as PyTorch implementations based on Wav2Vec2.

Project Structure

setup.sh – installs Python 3.11 and all required packages (JAX, Transformers, PyTorch, etc.).
download.sh – downloads training data from Kaggle and other public sources.
functions.py, functions_infer.py – dataset utilities and dataloaders.
run_train.py – main JAX training script for the Whisper model.
run_train_txt.py – optional text-only training of the decoder.
model_wav2vec_CTC.py – PyTorch approach using Wav2Vec2 with a CTC head.
*.ipynb – Jupyter notebooks with experiments and evaluations.

Setup

Install dependencies
```
bash setup.sh
```
Download data
```
bash download.sh
```
Place your kaggle.json credentials in the project root before running the script.
Activate the virtual environment
```
source ~/.venv311/bin/activate
```

Training

Whisper (JAX/TPU)

Edit hyperparameters in run_train.py as necessary, then execute:

python run_train.py

Text-only fine-tuning

For additional training using only text data, run:

python run_train_txt.py

Wav2Vec2 CTC (PyTorch)

The file model_wav2vec_CTC.py contains a PyTorch implementation with a CTC loss. Run it directly after adjusting paths:

python model_wav2vec_CTC.py

Inference

functions_infer.py shows how to create an inference dataset and collate function. See the notebooks for end-to-end examples.

Notebooks

Open the Jupyter notebooks for exploration and evaluation:

jupyter lab

License

This repository is provided for research purposes. Please review the licenses of the datasets used before redistribution.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md
download.sh		download.sh
eval.ipynb		eval.ipynb
eval_pt CTC.ipynb		eval_pt CTC.ipynb
eval_pt.ipynb		eval_pt.ipynb
explore.ipynb		explore.ipynb
functions.py		functions.py
functions_infer.py		functions_infer.py
gcp		gcp
hp-tuning-wav2vec2.ipynb		hp-tuning-wav2vec2.ipynb
inference-asr-notebook.ipynb		inference-asr-notebook.ipynb
model-add-data.ipynb		model-add-data.ipynb
model.ipynb		model.ipynb
model_LM_convert.ipynb		model_LM_convert.ipynb
model_LM_convert_wav2vec.ipynb		model_LM_convert_wav2vec.ipynb
model_LM_convert_whisper.ipynb		model_LM_convert_whisper.ipynb
model_local.ipynb		model_local.ipynb
model_wav2vec_CTC.ipynb		model_wav2vec_CTC.ipynb
model_wav2vec_CTC.py		model_wav2vec_CTC.py
model_wav2vec_CTC_best.py		model_wav2vec_CTC_best.py
run_train.py		run_train.py
run_train_txt.py		run_train_txt.py
setup.sh		setup.sh
weight_transfer.ipynb		weight_transfer.ipynb
whisper_LM.ipynb		whisper_LM.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bengali-ASR

Project Structure

Setup

Training

Whisper (JAX/TPU)

Text-only fine-tuning

Wav2Vec2 CTC (PyTorch)

Inference

Notebooks

License

About

Uh oh!

Releases

Packages

Languages

zhenlan0426/Bengali-ASR

Folders and files

Latest commit

History

Repository files navigation

Bengali-ASR

Project Structure

Setup

Training

Whisper (JAX/TPU)

Text-only fine-tuning

Wav2Vec2 CTC (PyTorch)

Inference

Notebooks

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages