DurIAN

Implementation of DurIAN: Duration Informed Attention Network For Multimodal Synthesis

Blog

论文笔记：腾讯AI lab多模态语音合成模型DurIAN

Structure

English:

I use the same encoder as tacotron2
I remove attention module in decoder and use average pooling to implement "predicting r frames at once"
I remove position encoding and skip encoder in this implementation

Chinese:

我用了和tacotron2相同的encoder结构，但是参数更小
我去除了decoder中的attention模块，由于一步输出三帧，我对三个时间步的memory进行了相加求均值的操作，表现在代码中为average pooling，经过实验，相比与存在attention模块的decoder，这样的音质会受到很小的负面影响，但是训练速度有了极大的提高
我舍弃了position encoding和skip encoder，这对合成效果的影响很小

Sample & Pretrained model

sample here, I use waveglow as vocoder, pretrained model here, batchsize is 32, step is 180k.

Usage

training:

pip install -r requirements.txt
download and extract LJSpeech dataset
put LJSpeech dataset in data
unzip alignments.zip
python3 preprocess.py
CUDA_VISIBLE_DEVICES=0 python3 train.py

testing:

Put Nvidia pretrained waveglow model in the waveglow/pretrained_model
CUDA_VISIBLE_DEVICES=0 python3 test.py --step [step-of-checkpoint]

testing using pretrained model:

put pretrained model in model_new
CUDA_VISIBLE_DEVICES=0 python3 test.py --step 180000

Notes

尽管DurIAN的生成速度比不上FastSpeech，但是DurIAN生成的样本音质好于FastSpeech，并且计算量也小于FastSpeech，在实际部署中，DurIAN的生成速度已经完全满足RTF要求。

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
audio		audio
data		data
img		img
sample		sample
text		text
waveglow		waveglow
.gitignore		.gitignore
README.md		README.md
alignments.zip		alignments.zip
dataset.py		dataset.py
glow.py		glow.py
hparams.py		hparams.py
loss.py		loss.py
model.py		model.py
modules.py		modules.py
optimizer.py		optimizer.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DurIAN

Blog

Structure

Sample & Pretrained model

Usage

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

entn-at/DurIAN-1

Folders and files

Latest commit

History

Repository files navigation

DurIAN

Blog

Structure

Sample & Pretrained model

Usage

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages