Skip to content

Getting silence wav files when generating clips with WAVEGLOW #2

@amitli1

Description

@amitli1
  • I'm running the following command:
    python generate_clips.py --model WAVEGLOW --text "network" --N 5 --max_per_speaker 1 --output_dir generated_clips

  • And I'm getting 5 wav files, with no audio (just silence).

  • when I change the model from WAVEGLOW to VITS - it works (getting 5 speech wav files)

  • output logs (when using WAVEGLOW):

python generate_clips.py --model WAVEGLOW --text "network" --N 5 --max_per_speaker 1 --output_dir generated_clips
Loading WAVEGLOW model...
[2025-01-20 13:59:13,039] WARNING - Saved BentoService Python version mismatch: loading BentoService bundle created with Python version 3.7.6, but current environment version is 3.8.10.
[2025-01-20 13:59:13,468] WARNING - pip package requirement bentoml==0.12.1 already exist
[2025-01-20 13:59:13,469] WARNING - pip package requirement torch==1.7.1 does not match the version installed in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement numpy==1.19.2 does not match the version installed in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement inflect==4.1.0 does not match the version installed in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement scipy==1.5.2 does not match the version installed in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement Unidecode==1.0.22 not found in current python environment
[2025-01-20 13:59:13,469] WARNING - pip package requirement librosa==0.6.0 does not match the version installed in current python environment
/home/vh/Repo/wake_word/venv3810WakeWord/lib/python3.8/site-packages/torch/serialization.py:888: SourceChangeWarning: source code of class 'glow.WaveGlow' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/vh/Repo/wake_word/venv3810WakeWord/lib/python3.8/site-packages/torch/serialization.py:888: SourceChangeWarning: source code of class 'torch.nn.modules.conv.ConvTranspose1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/vh/Repo/wake_word/venv3810WakeWord/lib/python3.8/site-packages/torch/serialization.py:888: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/vh/Repo/wake_word/venv3810WakeWord/lib/python3.8/site-packages/torch/serialization.py:888: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv1d' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/home/vh/Repo/wake_word/synthetic_speech_dataset_generation/models/waveglow/TextToSpeechModel/audio_processing.py:197: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = pad_center(fft_window, filter_length)
/home/vh/Repo/wake_word/synthetic_speech_dataset_generation/models/waveglow/TextToSpeechModel/audio_processing.py:104: FutureWarning: Pass sr=22050, n_fft=1024, n_mels=80, fmin=0.0, fmax=8000.0 as keyword args. From version 0.10 passing these as positional arguments will result in an error

mel_basis = librosa_mel_fn(
Number of speakers : 123
Generating clips: 0%| | 0/5 [00:00<?, ?it/s]/home/vh/Repo/wake_word/synthetic_speech_dataset_generation/models/waveglow/TextToSpeechModel/text_to_speech.py:97: RuntimeWarning: invalid value encountered in cast
speech = (speech*32767).astype(np.int16) # convert to 16-bit PCM data
Generating clips: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00, 2.88it/s]
5 clips generated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions