Skip to content

Label file is not avaliable and there is something wrong in do_get_wavs_lables() func while the train data it is obtained in https://www.openslr.org test-noise.tgz #46

Open
@LamentXU123

Description

@LamentXU123

I don't know what is going on with the label file in https://www.openslr.org/resources/18/test-noise.tgz

PROBABLY BECAUSE THIS PROGRAM DON'T SUPPORT THIS FILE :(

so the label file begins like:

image

so obiviously the first line is not avaliable for the code to load in utils.py line 41

# utils.py line 41
    labels_dict = {}
    with open(label_file, 'rb') as f:
        for label in f:
            label = label.strip(b'\n')
            label_id = label.split(b' ', 1)[0]
            label_text = label.split(b' ', 1)[1]
            labels_dict[label_id.decode('ascii')] = label_text.decode('utf-8')

and the format is totally a mess......

1.its wrong when u split the string with space caz its actually a 'TAB' XD

2.and even if u split the string when 'TAB' , the first item of the split list must be hashes(128 digital) and not the label_id

3.meanwhile the format of the train_data is actually .mp3

so the code is supposed to be changed to:

    labels_dict = {}
    with open(label_file, 'rb') as f:
        for label in f:
            label = label.strip(b'\n')
            tmp = label.split(b'common', 1)[1]
            tmp = tmp.split(b'.wav', 1)[0]
            label_id = b'common'+tmp
            label_text = label.split(b'.wav', 1)[1]
            labels_dict[label_id.decode('ascii')] = label_text.decode('utf-8')

and Ctrl+F .mp3 to .wav in label file XD

so hope u could add some support to https://www.openslr.org/resources/18/test-noise.tgz caz it is highlited in the readme file

at least u could change the code to

    with open(label_file, 'rb') as f:
        i = 1
        for label in f:
            try:
                label = label.strip(b'\n')
                label_id = label.split(b' ', 1)[0]
                label_text = label.split(b' ', 1)[1]
                labels_dict[label_id.decode('ascii')] = label_text.decode('utf-8')
                i += 1
            except Exception as error:
                print(f'WARNING: Error occurred in label file (line {i}) while loading')
                print(error)
                input('press [ENTER] to continue')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions