-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valid step generates a RuntimeError #8
Comments
We've made other attempts, but the results are the same. Any idea @TParcollet ? Fabien. |
@Craya Hello, thank you for your interesting in this paper. The problem is for very short input sentences, the CNN branch of the Branchformer will have larger padding sizes than then sequences length. Thus, this is an issue of the Branchformer architecture. So, my suggestion is you can try to filter out very short sequences in your dataset if you want to use Branchformer. There are no such issues for Conformers. Indeed, we have a Conformer SummaryMixing W2V2 and will release the paper and the code soon. Thus, if your project is not super urgent, can you go back and check our Conformer SummaryMixing code when it is released. Also, if you would prefer to implement it by yourself so you can run experiments immediately, please feel free to ask me questions. I hope this is helpful. Shucong |
Thanks @shucongzhang for your clear answer. Our dataset is composed of audios between 0,5s and 10s. When you say to filter out "very short sentences", do you have an idea of value for this minimum duration? We will check your Conformer SummaryMixing W2V2 release for sure, in this git repo as well? Thanks. |
@Craya No worries. I would suggest trying 1s or 2s minimal length. But again I would suggest using Conformers if the short utterances make a large portion of your dataset. For W2V2 it should also be in this repo. Please let me know if there anything else I can help with. |
Dear Team,
I want to compare the ASR results we have reached based on wav2vec2 & whisper architectures, with your SummaryMixing one.
We are performing a custom ASR training, our dataset is composed of 95 000 records for Train, 16 000 records for Val, 17 000 records for Test.
Train was successfully performed with the following parameters (A100 40G GPU):
However, at epoch 1 valid step, we got the following error:
What's wrong?
Thanks for your support.
The text was updated successfully, but these errors were encountered: