W I P 🔧⚠️🚧🔨

R^3 - M · O · E

[RecurrentNN × Regression × Regularized] based Mouth Opening Estimation via SSL

Installation

Install PyTorch from official instructions: https://pytorch.org/get-started/locally/
Install dependencies:

pip install -r requirements.txt

Preprocessing

1. Mouth Opening Data

Collect data using LipsSync. Directory structure:
```
2025-02-04_22-01-52/
    audio.wav
    mouth_data.csv
2025-02-04_22-43-56/
    audio.wav
    mouth_data.csv
valid.txt
```
- Prepare seen validation set (in-distribution speakers) and unseen validation set (out-of-distribution speakers)
- Add audio paths to valid.txt
- For SSL: Prepare unlabeled vocal-only audio (intact spectrum below 16kHz)

Run preprocessing:

# Labeled data
python recipes/mouth_opening/preprocess.py <SOURCE_DIR> <TARGET_DIR>

# Unlabeled data (SSL)
python recipes/mouth_opening/preprocess_unlabel.py <SOURCE_DIR> <TARGET_DIR>

Base Training

Run training:

python train.py --exp_name <EXP_NAME> --dataset <DATA_PATH> --gpu <GPU_ID>

View all options with python train.py --help. Variants:

train_r_drop.py (R-Drop regularization)
train_mse.py (MSE loss)

SSL Training

Command:

python train_ssl.py --exp_name <EXP_NAME> --dataset <DATA_PATH> --unlabel_dataset <UNLABEL_PATH> --gpu <GPU_ID>

Prerequisites:

Create valid2.txt with unseen validation paths
--conv_dropout must be non-zero

Recommendations

Use 10+ hours of seen data
Prepare 50+ hours of unlabeled data
Tested datasets: Labeled: mouth opening research project MultiModal: Acappella GRID URSing Unlabeled: PopBuTFy from NeuralSVB, PopCS from DiffSinger, M4Singer, Jingju a Cappella Recordings Collection, tiny-singing-voice-database, OpenSinger, GTSinger

Inference

python eval.py --model <model_path> --wav <wav_path>

Acknowledgements

Mr. Kanru Hua
Framework cloned from GeneralCurveEstimator
Training code adapted from vocal-remover
Early model reference: FCPE
SSL inspiration: SOFA
Core references:

R-Drop: Regularized Dropout for Neural Networks [CODE]

Temporal Ensembling for Semi-Supervised Learning [CODE]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results [CODE]
Partial Dataset Reference:

Cooke, M., Barker, J., Cunningham, S., & Shao, X. (2006). The Grid Audio-Visual Speech Corpus (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3625687

Bochen Li, Yuxuan Wang, and Zhiyao Duan, Audiovisual singing voice separation, Transactions of the International Society for Music Information Retrieval, 4(1), pp.195–209, 2021. DOI: http://doi.org/10.5334/tismir.108.

Rong Gong, Rafael Caro, Yile Yang, & Xavier Serra. (2022). Jingju a Cappella Recordings Collection (2.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6536490

Zhang, L., Li, R., Wang, S., Deng, L., Liu, J., Ren, Y., He, J., Huang, R., Zhu, J., Chen, X., & Zhao, Z. (2022). M4Singer: A multi-style, multi-singer and musical score provided Mandarin singing corpus [Data set]. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.

Resources

Data collection tool: LipsSync
Visualization tool: lips-sync-visualizer
.ass mask tools: mask_fix_tools
Data expansion initiative: DiffSinger Discussion

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
experiments		experiments
img		img
lib		lib
logger		logger
recipes		recipes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
app.py		app.py
concat_npy.py		concat_npy.py
eval.py		eval.py
requirements.txt		requirements.txt
spk_dict.py		spk_dict.py
train.py		train.py
train_mse.py		train_mse.py
train_r_drop.py		train_r_drop.py
train_ssl.py		train_ssl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

W I P 🔧⚠️🚧🔨

R^3 - M · O · E

Installation

Preprocessing

1. Mouth Opening Data

Base Training

SSL Training

Recommendations

Inference

Acknowledgements

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

KakaruHayate/R3MOE

Folders and files

Latest commit

History

Repository files navigation

W I P 🔧⚠️🚧🔨

R^3 - M · O · E

Installation

Preprocessing

1. Mouth Opening Data

Base Training

SSL Training

Recommendations

Inference

Acknowledgements

Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages