Open
Description
The result of analysis-synthesis is a longer speech audio. Is there something wrong here? The code prepend a 0.5s silence before the analysis, but the resulting audio is NOT 0.5s longer than the source audio. For example, this file is 7.34s in duration, but the systhesized one is 8.04s.
There is an error message, which I'm not sure if it has something to the change in duration:
Error(s) in loading state_dict for StagedVQVAE:
Unexpected key(s) in state_dict: "mel_spectrogram.mel_stft.mel_scale.fb", "mel_spectrogram.mel_stft.spectrogram.window"
Metadata
Metadata
Assignees
Labels
No labels