Skip to content

Files

Latest commit

f7fa3c0 · Jan 6, 2023

History

History

aishell4

AISHELL-4 for Pyannote

These scripts automatically download the AISHELL-4 dataset and set it up to be used with pyannote-database.

It will generate two subsets from the original train set : custom_train and custom_dev, as the original dataset only has training and test data. Defaults are 12h for custom_dev, and what's left (~92h) for custom_train.

Out-of-the-box protocol for pyannote.audio training is AISHELL.SpeakerDiarization.Custom.

Instruction

Run setup.sh to download and extract the files.

Original sets info

subset # files total length
train 191 104h46m
test 20 12h34m

Credits