-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train_size referenced before assignment #93
Comments
after checking the code the bug is for a given langauge if there is no validated audio the train_size will not be intialized. the reason is that the len(validated is 0) |
This was created to deal with the release clips.tsv and not a clips.tsv generated after the release. The release clips.tsv did not have any languages without validated clips. That said, crashing, as it does now, or issuing a warning and exiting seems like the only reasonable behaviors in this situation. What do you think? |
ah ok i see. i was thinking of ignoring validation set for language without validated recordings. |
and print a warning because it may be a problem for those deploying the common voice locally. |
@Gregoor What would be you desired behavior for the scripts that create the Common Voice releases? My gut says a catastrophic failure might be of more use in that situation as it will definitely be noticed while a "silent" error being printed might not be noticed and data sets would go out without data. |
My take would be that we should notice at some other point that we don't have validated data for a language. The bundler script also gathers stats, in part based on the corpora creator's result, which would show if a validated set is empty for a language. |
@Gregoor I guess my question is this: If the stats are gathered, will they be reliably looked at? There are 96 languages in pontoon so it's not so unlikely that if 96, or more, languages are released one stat will be overlooked. However, if the bundler script doesn't run to completion, then no stats are there, and that will be noticed. |
Yeah that's a fair point, probably not. But I think the bundler would be a better place to act on this. And I guess we have to decide how we'd wanna act on it (maybe exclude the language from release?). |
i think exclude it from release is a good choice. |
Hello, when trying to use the create-corpora -d corpora -f out/clips.tsv command i have the following error UnboundLocalError: local variable 'train_size' referenced before assignment caused by at line 115.
Help please.
The text was updated successfully, but these errors were encountered: