Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail downloading Seamless align data #41

Open
lzl-mt opened this issue Aug 31, 2023 · 1 comment
Open

Fail downloading Seamless align data #41

lzl-mt opened this issue Aug 31, 2023 · 1 comment

Comments

@lzl-mt
Copy link

lzl-mt commented Aug 31, 2023

when i follow https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/seamless_align_README.md, try to download the dataset, use
zcat seamless.dataset.metadata.public.arb-enA.tsv.gz | egrep ^crawl-data | tr '\t' ' ' | build/bin/wet_lines
raise Error:
image

and no wav is saved;
BTW, this script cost a lot of time to process something, but i cant find anything download in my workspace, is there any possible method to save each wav or text during the hole processing stage? Thx a lot.

@lzl-mt
Copy link
Author

lzl-mt commented Sep 5, 2023

I try again but still get same error, and save nothing, cost almost 2 days
what(): /home/ubuntu/preprocess/preprocess/wet_lines_main.cc:71 in void Retrieve::Add(util::StringPiece, const Extract&) threw util::Exception because !extracts.empty() && extracts.back().paragraph_num ber > extract.paragraph_number'. Metadata should be sorted by paragraph number in each document

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant