Getting silence wav files when generating clips with WAVEGLOW

* I'm running the following command:
` python generate_clips.py --model WAVEGLOW --text "network" --N 5 --max_per_speaker 1 --output_dir generated_clips`

* And I'm getting 5 wav files, with no audio (just silence).

* when I change the model from **WAVEGLOW** to **VITS** - it works (getting 5 speech wav files)

* output logs (when using WAVEGLOW):

python generate_clips.py --model WAVEGLOW --text "network" --N 5 --max_per_speaker 1 --output_dir generated_clips
Loading WAVEGLOW model...
[2025-01-20 13:59:13,039] WARNING - Saved BentoService Python version mismatch: loading BentoService bundle created with Python version 3.7.6, but current environment version is 3.8.10.
[2025-01-20 13:59:13,468] WARNING - pip package requirement bentoml==0.12.1 already exist
[2025-01-20 13:59:13,469] WARNING - pip package requirement `torch==1.7.1` does not match the version installed in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement `numpy==1.19.2` does not match the version installed in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement `inflect==4.1.0` does not match the version installed in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement `scipy==1.5.2` does not match the version installed in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement `Unidecode==1.0.22` not found in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement `librosa==0.6.0` does not match the version installed in current python environment
/home/vh/Repo/wake_word/venv3810WakeWord/lib/python3.8/site-packages/torch/serialization.py:888: SourceChangeWarning: source code of class 'glow.WaveGlow' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/vh/Repo/wake_word/venv3810WakeWord/lib/python3.8/site-packages/torch/serialization.py:888: SourceChangeWarning: source code of class 'torch.nn.modules.conv.ConvTranspose1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/vh/Repo/wake_word/venv3810WakeWord/lib/python3.8/site-packages/torch/serialization.py:888: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/vh/Repo/wake_word/venv3810WakeWord/lib/python3.8/site-packages/torch/serialization.py:888: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
/home/vh/Repo/wake_word/synthetic_speech_dataset_generation/models/waveglow/TextToSpeechModel/audio_processing.py:197: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = pad_center(fft_window, filter_length)
/home/vh/Repo/wake_word/synthetic_speech_dataset_generation/models/waveglow/TextToSpeechModel/audio_processing.py:104: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error

  mel_basis = librosa_mel_fn(
Number of speakers : 123
Generating clips:   0%|                                                                                                                                                              | 0/5 [00:00<?, ?it/s]/home/vh/Repo/wake_word/synthetic_speech_dataset_generation/models/waveglow/TextToSpeechModel/text_to_speech.py:97: RuntimeWarning: invalid value encountered in cast
  speech = (speech*32767).astype(np.int16) # convert to 16-bit PCM data
Generating clips: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  2.88it/s]
5 clips generated!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting silence wav files when generating clips with WAVEGLOW #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Getting silence wav files when generating clips with WAVEGLOW #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions