Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

German 12.0 Segment missing train dev test TSV files #20

Open
LozramA opened this issue Jan 27, 2023 · 3 comments
Open

German 12.0 Segment missing train dev test TSV files #20

LozramA opened this issue Jan 27, 2023 · 3 comments

Comments

@LozramA
Copy link

LozramA commented Jan 27, 2023

The newest german segment "cv-corpus-12.0-delta-2022-12-07-de.tar" does not include the train.tsv dev.tsv and test.tsv.

@HarikalarKutusu
Copy link

Hey @LozramA, I don't know your workflow, but, if you have validated.tsv file, you could actually merge it with the v11.0 validated.tsv and use CorporaCreator to generate a new train/dev/test set.

@LozramA
Copy link
Author

LozramA commented Feb 8, 2023

Thanks @HarikalarKutusu but on german V11 segment are missing all tsv files. so this is completely unusable.
I used CV12 full but thats getting now all too big for privatly available computer/GPU power. Was running many days to train and not really many epochs ( RTX 2060 Intel i7 )

@HarikalarKutusu
Copy link

HarikalarKutusu commented Feb 8, 2023

Yes, if you not already have v11 in full you need to download it, unfortunately... And I know, it is a painful process.

Btw, if you already have the mp3 files, I can share full .tsv files with you for any version... I had to extract them for the cv-dataset-analyzer project I implemented.

And secondly, the correct repo for the issue is https://github.com/common-voice/common-voice-bundler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants