-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hello, @relativeflux Thanks for reviving the SampleRNN in TensorFlow.
I have a question regarding the audio generation using a model trained on 1 audio of 8sec just for some inference and for validation, just one audio file.
--data_dir ./chunks --num_epochs 100 --batch_size 1 --max_checkpoints 1 --checkpoint_every 10 --output_file_dur 10 --sample_rate 11025
Audio Sampling_rate: 11025
I trained the model for around 40 epochs and while training and training accuracy comes out to 100% and validation accuracy is to be 4.132, as expected.
For Ref :
Epoch: 40/100, Step: 82/86, Loss: 0.000, Accuracy: 100.000, (0.440 sec/step)
Epoch: 40/100, Step: 83/86, Loss: 0.000, Accuracy: 100.000, (0.449 sec/step)
Epoch: 40/100, Step: 84/86, Loss: 0.000, Accuracy: 100.000, (0.438 sec/step)
Epoch: 40/100, Step: 85/86, Loss: 0.000, Accuracy: 100.000, (0.434 sec/step)
Epoch: 40/100, Step: 86/86, Loss: 0.000, Accuracy: 100.000, (0.437 sec/step)
Epoch: 40/100, Total Steps: 86, Loss: 0.000, Accuracy: 100.000, Val Loss: 13.038, Val Accuracy: 4.132 (1 min 0.427 sec)
But when I hear the generated audio using this checkpoint, I can hear only a small sequence of data and mostly corrupted by noise and nothing else. Generated audio sampled for 10 sec. But if I am not working due to overfitting, generated audio must provide exact training data as output or something very similar.
Just wanted to ask, am I doing something wrong or this is an expected result.