metadata.json missing for DALI train/dev split #4

scaperothian · 2024-03-11T13:02:45Z

Hello, thank you the publication of your work on this research topic. I am interested in using your repo to fine tune WAV2VEC with DALI and other data. when i run the dali_prepare.py scripts in DALI/LM/dali_prepare.py:

python dali_prepary.py --data_folder=/path/to/DALI_v2.0/

it returns the following:

Traceback (most recent call last):
  File "dali_prepare.py", line 84, in <module>
    prepare_text_dali(root=args.data_folder, save_folder=args.save_folder)
  File "dali_prepare.py", line 38, in prepare_text_dali
    with open(anno_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '../../../DALI_v2.0/metadata.json'

The metadata.json is not found in DALI dataset from Zenodo nor in DALI github page.

I can just recreate based on your paper's relative hours of data, but would perfer to just use your exact json and modify as needed (i.e. based on connectivity, etc.).

Thanks again to your contributions to this field.

The text was updated successfully, but these errors were encountered:

brenzjam · 2024-05-01T13:29:41Z

Hi guxm2021,

This is very interesting work. I was delighted to read your paper and eager to experiment with this repository. I have a similar question to scaperothian. Could you tell us where to find the metadata.json, or post an example json so we can recreate the format?

Many thanks,
Brens

Sonata165 · 2024-05-02T03:39:30Z

Thank you so much for your interest to this project!

The meta_data.json is a new file we generated during data processing procedure, containing the text annotation and path to audio, for each utterance-level sample in the dataset. We have processed all dataset to a similar format (metadata + a folder with utterance-level samples). I’m sorry for the delay of uploading this part of code and the corresponding procedures in readme. I’ll try to clean up the code of this part and post it to github before next week.

brenzjam · 2024-05-03T07:50:34Z

Lovely to hear from you Longshen,

No problem at all. In fact I found your response rather fast! So just to make sure I've interpreted correctly: you segment the audio tracks into individual tracks for each utterance before making the metadata.json? And by utterance, do you mean phoneme, word or phrase/line?

Thanks again! I don't know why your repo hasn't gotten more attention. It looks pretty cool.
Brendan

Sonata165 · 2024-05-07T11:41:42Z

We've updated the data processing code here. Please follow the Readme.md inside that dir to prepare data.

Hi Brendan, yes, for each of dataset, the audio were separated into utterances and metadata.json was created for the utterance-level version of dataset. By utterance I mean one line of lyric in the song.

Sonata165 · 2024-05-07T11:44:56Z

Btw, if you need access to full audio of DALI v2 instead of downloading them from youtube (actually a proportion of their urls has become invalid after years), please send me an email to [email protected], from any of your outlook email address, and then I can share the audio (currently saved in my OneDrive) to you. Thanks for your patience.

brenzjam · 2024-05-31T10:36:59Z

Hi @Sonata165 , thanks for the response. Unfortuantely I am unable to successfully send emails to you, and have been trying to do so from my University's outlook email account. Is this definitely accurate, or is there another email I can contact you through?

guxm2021 · 2024-06-03T02:07:14Z

I think his email address is "[email protected]".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metadata.json missing for DALI train/dev split #4

metadata.json missing for DALI train/dev split #4

scaperothian commented Mar 11, 2024

brenzjam commented May 1, 2024

Sonata165 commented May 2, 2024

brenzjam commented May 3, 2024

Sonata165 commented May 7, 2024

Sonata165 commented May 7, 2024

brenzjam commented May 31, 2024

guxm2021 commented Jun 3, 2024

metadata.json missing for DALI train/dev split #4

metadata.json missing for DALI train/dev split #4

Comments

scaperothian commented Mar 11, 2024

brenzjam commented May 1, 2024

Sonata165 commented May 2, 2024

brenzjam commented May 3, 2024

Sonata165 commented May 7, 2024

Sonata165 commented May 7, 2024

brenzjam commented May 31, 2024

guxm2021 commented Jun 3, 2024