Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Dataset

The link of raw data can be found from here, password: vvhn

Data Preparation

There are two ways to obtain the features of V2C dataset: 1) Directly download the features from here; 2) Process features by ourselves.

1) Download Feature Directly

Please download all the features (.zip) and json files from here and unzip them in the folder "./preprocessed_data/MovieAnimation"

2) Process Features by Ourselves

Preprocessing

First, run

python3 prepare_align.py config/MovieAnimation/preprocess.yaml

for some preparations.

As described in the paper, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. Alignments of the supported datasets are provided here. You have to unzip the files in "preprocessed_data/MovieAnimation/TextGrid/".

After that, run the preprocessing script by

python3 preprocess.py config/MovieAnimation/preprocess.yaml

Alternately, you can align the corpus by yourself. Download the official MFA package and run

./montreal-forced-aligner/bin/mfa_align raw_data/MovieAnimation/ lexicon/librispeech-lexicon.txt english preprocessed_data/MovieAnimation

or

./montreal-forced-aligner/bin/mfa_train_and_align raw_data/MovieAnimation/ lexicon/librispeech-lexicon.txt preprocessed_data/MovieAnimation

to align the corpus and then run the preprocessing script.

python3 preprocess.py config/MovieAnimation/preprocess.yaml

Speaker Encoder

python ./speaker_encoder/speaker_encoder.py

Emotion Encoder

python ./emotion_encoder/video_features/emotion_encoder.py

Training and evaluating

Download the checkpoints 900000.pth.tar from here and put them in "./output/ckpt/MovieAnimation/"

Train your model with

python3 train.py --restore_step 900000 -p config/MovieAnimation/preprocess.yaml -m config/MovieAnimation/model.yaml -t config/MovieAnimation/train.yaml -p2 config/MovieAnimation/preprocess.yaml

Quickly evaluation: set "quick_eval = True" in evaluate.py for only evaluating 32 samples

Full evaluation: set "quick_eval = False" in evaluate.py for evaluating all samples

#Tensorboard

Use

tensorboard --logdir output/log/MovieAnimation

to serve TensorBoard on your localhost. The loss curves, mcd curves, synthesized mel-spectrograms, and audios are shown.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
MultimodalTransformer		MultimodalTransformer
audio		audio
config/MovieAnimation		config/MovieAnimation
description		description
emotion_encoder/video_features		emotion_encoder/video_features
hifigan		hifigan
lexicon		lexicon
model		model
preprocessed_data/MovieAnimation		preprocessed_data/MovieAnimation
preprocessor		preprocessor
speaker_encoder		speaker_encoder
text		text
transformer		transformer
utils		utils
README.md		README.md
__init__.py		__init__.py
basename.txt		basename.txt
dataset.py		dataset.py
embedding.py		embedding.py
emo.txt		emo.txt
enhance_speach.py		enhance_speach.py
evaluate.py		evaluate.py
inference.py		inference.py
mcd.py		mcd.py
preprocess.py		preprocess.py
read_ckpt.py		read_ckpt.py
requirements.txt		requirements.txt
save_phoneme.py		save_phoneme.py
synthesize.py		synthesize.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dependencies

Dataset

Data Preparation

1) Download Feature Directly

2) Process Features by Ourselves

Preprocessing

Speaker Encoder

Emotion Encoder

Training and evaluating

About

Uh oh!

Releases

Packages

Languages

Nanolbw/Movie-dubbing

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Dataset

Data Preparation

1) Download Feature Directly

2) Process Features by Ourselves

Preprocessing

Speaker Encoder

Emotion Encoder

Training and evaluating

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages