Mozilla tts output voice still sounds robotic after almost 400k #180

JRMeyer · 2021-03-07T08:32:11Z

JRMeyer
Mar 7, 2021
Maintainer

>>> Yilmaz_Ay
[April 13, 2020, 7:22am]

Hi All, slash
I trained my audio set which consists of about 27 hours of auidos of 10
seconds length and 16000 Hz sample rate with Tacotron2. It took about 4
and half days to train. At this stage the test audios still sounds a
little bit robotic. And in some of our test audios some words are
missing. In some audios there are repetitions. When I look at the graphs
on the tensorboard pages, the graphs look normal. slash

slash

slash
My config values mostly are according to the default values. Could
anyone have a look at my configs and let me know what could be wrong? Is
there any parameters that I can change to remove robotic sound from the
test values and improve the output waves quality? slash
My configs are as blow: slash
'model': 'Tacotron2', slash
'run_name': 'stspeech-stft_params', slash
'run_description': 'tacotron2 constant stf parameters', slash
'audio':{ slash
'num_mels': 80, slash
'num_freq': 1025, slash
'sample_rate': 16000, slash
'win_length': 1024, slash
'hop_length': 256, slash
'frame_length_ms': null, slash
'frame_shift_ms': null, slash
'preemphasis': 0.98, slash
'min_level_db': -100, slash
'ref_level_db': 20, slash
'power': 1.5, slash
'griffin_lim_iters': 30, slash
'signal_norm': true, slash
'symmetric_norm': true, slash
'max_norm': 4.0, slash
'clip_norm': true, slash
'mel_fmin': 0.0, slash
'mel_fmax': 8000.0, slash
'do_trim_silence': true, slash
'trim_db': 60 slash
}, slash
'characters':{ slash
'pad': ' slash _', slash
'eos': ' slash ~', slash
'bos': ' slash ^', slash
'characters':
slash 'ABCDEFGHIJKLMNOPQRSTUVWXYZÇĞİÖŞÜabcdefghijklmnopqrstuvwxyzçğıöşü!'(),-.:;?
', slash
'punctuations':'!'(),-.:;? slash ', slash
'phonemes':'iyɨʉɯuɪʏʊeøɘəɵɤoɛœɜɞʌɔæɐaɶɑɒᵻʘɓǀɗǃʄǂɠǁʛpbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟˈˌːˑʍwɥʜʢʡɕʑɺɧɚ˞ɫ' slash
},

'distributed':{
'backend': 'nccl',
'url': 'tcp: slash / slash /localhost:54321'
},

'reinit_layers': slash [ slash ], slash
'batch_size': 32, slash
'eval_batch_size':16, slash
'r': 7, slash
'gradual_training': slash [ slash [0, 7, 64 slash ], slash [1, 5, 64 slash ], slash [50000, 3, 32 slash ],
slash [130000, 2, 32 slash ], slash [290000, 1, 32 slash ] slash ], slash
'loss_masking': true, slash
'run_eval': true, slash
'test_delay_epochs': 5, slash
'test_sentences_file': 'tr_sentences.txt',

'noam_schedule': false,
'grad_clip': 1.0,
'epochs': 1000,
'lr': 0.00001,
'wd': 0.000001,
'warmup_steps': 4000,
'seq_len_norm': false,

'memory_size': -1,
'prenet_type': 'original',
'prenet_dropout': true,

'attention_type': 'original',
'attention_heads': 4,
'attention_norm': 'sigmoid',
'windowing': false,
'use_forward_attn': false,
'forward_attn_mask': false,
'transition_agent': false,
'location_attn': false,
'bidirectional_decoder': false,
'stopnet': true,
'separate_stopnet': true,

'print_step': 5,
'save_step': 5000,
'checkpoint': true,
'tb_model_param_stats': false,

'text_cleaner': 'phoneme_cleaners',
'enable_eos_bos_chars': false,
'num_loader_workers': 1,
'num_val_loader_workers': 1,
'batch_group_size': 0,
'min_seq_len': 6,
'max_seq_len': 150,

'output_path': 'train_logs/',

'phoneme_cache_path': 'mozilla_tr_phonemes_2_1',
'use_phonemes': true,
'phoneme_language': 'tr',

'use_speaker_embedding': false,
'style_wav_for_test': null,
'use_gst': false,

'datasets':
[
{
'name': 'stspeech',
'path': 'STS-22K/',
'meta_file_train': 'metadata_train.csv',
'meta_file_val': 'metadata_test.csv'
}
]

}

Four config parameters are different than the default values: slash
1: the sample rate slash
2: 'griffin_lim_iters' , I reduced it to 30 the default was 60. I did
this to reduce the training time. slash
3: I reduced number of workers to 1, the defaults were 4. I thought that
is something to the with number of GPUS. Since I have just one GPU, I
thought I need to change them as 1. slash
4: min and max seq length parameters. Actually I forgot to change them
according to the my data's lengths. How much effect does it have on the
quality?

I appreciate any insights or comments or suggestions about what could be
wrong with my training. slash
Many many thanks in advance.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/mozilla-tts-output-voice-still-sounds-robotic-after-almost-400k]

JRMeyer · 2021-03-07T08:32:13Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> sanjaesc
[April 13, 2020, 3:43pm]

> 2: 'griffin_lim_iters' , I reduced it to 30 the default was 60. I did
> this to reduce the training time.

GriffinLim has nothing to do with the training speed. It's an algorithm
used to synthesize speech.

> 3: I reduced number of workers to 1, the defaults were 4. I thought
> that is something to the with number of GPUS. Since I have just one
> GPU, I thought I need to change them as 1.

num_workers are used to load the data into batches during training, it's
using CPU... so you actually might slow down training setting it to 1...
default 4 should be fine.

> At this stage the test audios still sounds a little bit robotic.

Default TTS (using GriffinLim as vocoder) will always sound robotic. slash
If you want natural sounding speech... neural vocoders are what you are
looking for.

Mozilla TTS has adapted https://github.com/erogol/WaveRNN or
https://github.com/erogol/ParallelWaveGAN as such.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:32:16Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> Yilmaz_Ay
[April 14, 2020, 6:42am]

Hi sanjaesc, slash
Thanks a lot. That was very helpful.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mozilla tts output voice still sounds robotic after almost 400k #180

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Mozilla tts output voice still sounds robotic after almost 400k #180

JRMeyer Mar 7, 2021 Maintainer

Replies: 2 comments

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author