Description
I don't know what is going on with the label file in https://www.openslr.org/resources/18/test-noise.tgz
PROBABLY BECAUSE THIS PROGRAM DON'T SUPPORT THIS FILE :(
so the label file begins like:
so obiviously the first line is not avaliable for the code to load in utils.py line 41
# utils.py line 41
labels_dict = {}
with open(label_file, 'rb') as f:
for label in f:
label = label.strip(b'\n')
label_id = label.split(b' ', 1)[0]
label_text = label.split(b' ', 1)[1]
labels_dict[label_id.decode('ascii')] = label_text.decode('utf-8')
and the format is totally a mess......
1.its wrong when u split the string with space caz its actually a 'TAB' XD
2.and even if u split the string when 'TAB' , the first item of the split list must be hashes(128 digital) and not the label_id
3.meanwhile the format of the train_data is actually .mp3
so the code is supposed to be changed to:
labels_dict = {}
with open(label_file, 'rb') as f:
for label in f:
label = label.strip(b'\n')
tmp = label.split(b'common', 1)[1]
tmp = tmp.split(b'.wav', 1)[0]
label_id = b'common'+tmp
label_text = label.split(b'.wav', 1)[1]
labels_dict[label_id.decode('ascii')] = label_text.decode('utf-8')
and Ctrl+F .mp3 to .wav in label file XD
so hope u could add some support to https://www.openslr.org/resources/18/test-noise.tgz caz it is highlited in the readme file
at least u could change the code to
with open(label_file, 'rb') as f:
i = 1
for label in f:
try:
label = label.strip(b'\n')
label_id = label.split(b' ', 1)[0]
label_text = label.split(b' ', 1)[1]
labels_dict[label_id.decode('ascii')] = label_text.decode('utf-8')
i += 1
except Exception as error:
print(f'WARNING: Error occurred in label file (line {i}) while loading')
print(error)
input('press [ENTER] to continue')