audio duration changed after analysis-synthesis

The result of analysis-synthesis is a longer speech audio. Is there something wrong here? The code prepend a 0.5s silence before the analysis, but the resulting audio is NOT 0.5s longer than the source audio. For example, [this file](https://hhguo.github.io/DemoSoCodec/audio/tts/system_comparison/wenetspeech4tts/reference/TEST_NET_Y0000000122_zb0dLCYAFug_S00037.wav) is 7.34s in duration, but the systhesized one is 8.04s. 

There is an error message, which I'm not sure if  it has something to the change in duration:

Error(s) in loading state_dict for StagedVQVAE:
    Unexpected key(s) in state_dict: "mel_spectrogram.mel_stft.mel_scale.fb", "mel_spectrogram.mel_stft.spectrogram.window"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

audio duration changed after analysis-synthesis #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

audio duration changed after analysis-synthesis #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions