You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent work!
I'm trying to train some models off of librispeech-all(1000+hours) by using my trainer. But after training some epochs, i still get some clumsy and noisy sound. i don't know what the problem is, and how can i solve it.
model i used:
model = DiffusionModel(
net_t=UNetV0, # The model type used for diffusion (U-Net V0 in this case)
in_channels=1, # U-Net: number of input/output (audio) channels
channels=[8, 32, 64, 128, 256, 512, 512, 1024, 1024], # U-Net: channels at each layer
factors=[1, 4, 2, 2, 2, 2, 2, 2, 2], # U-Net: downsampling and upsampling factors at each layer
items=[1, 2, 2, 2, 2, 2, 2, 4, 4], # U-Net: number of repeating items at each layer
attentions=[0, 0, 0, 0, 0, 1, 1, 1, 1], # U-Net: attention enabled/disabled at each layer
attention_heads=8, # U-Net: number of attention heads per attention item
attention_features=64, # U-Net: number of attention features per attention item
diffusion_t=VDiffusion, # The diffusion method used
sampler_t=VSampler, # The diffusion sampler used
use_text_conditioning=True, # U-Net: enables text conditioning (default T5-base)
use_embedding_cfg=True, # U-Net: enables classifier free guidance
embedding_max_length=64, # U-Net: text embedding maximum length (default for T5-base)
embedding_features=768, # U-Net: text mbedding features (default for T5-base)
cross_attentions=[0, 0, 0, 1, 1, 1, 1, 1, 1], # U-Net: cross-attention enabled/disabled at each layer
).cuda()
input speech length 2**16,
the model loss function appears to work correctly, and it can be gradually reduced.
But, the audio it sample from the model, it sounds like noise, but better than noise( I can hear someone saying in a very faint and noisy voice)
i want to ask How many epochs did you train with what dataset?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Thank you for your excellent work!
I'm trying to train some models off of librispeech-all(1000+hours) by using my trainer. But after training some epochs, i still get some clumsy and noisy sound. i don't know what the problem is, and how can i solve it.
model i used:
input speech length 2**16,
the model loss function appears to work correctly, and it can be gradually reduced.
But, the audio it sample from the model, it sounds like noise, but better than noise( I can hear someone saying in a very faint and noisy voice)
i want to ask How many epochs did you train with what dataset?
Beta Was this translation helpful? Give feedback.
All reactions