You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The training of CTC-Attention phoneme recognizer, speaker encoder, and Vocoder. Above three can be trained separately on their own.
The training of seq2seqMoL, it will need the output from CTC-Attention phoneme recognizer and speaker encoder. Each training instance is like (A's sentence_1, B's sentence_x, B's sentence_1), MSE is computed between the model's output B's sentence_1 and the ground truth B's sentence_1.
Please correct me if i am wrong.
Thanks!
The text was updated successfully, but these errors were encountered:
Hey, i am back again with another question :P
Can I interpret the two-stage training scheme as:
Please correct me if i am wrong.
Thanks!
The text was updated successfully, but these errors were encountered: