aishell4

Jan 6, 2023

f7fa3c0 · Jan 6, 2023

Name	Name	Last commit message	Last commit date
parent directory ..
.gitignore	.gitignore	Merge dev into master	Jan 5, 2023
README.md	README.md	Merge dev into master	Jan 5, 2023
database.yml	database.yml	Merge dev into master	Jan 5, 2023
generate_uems.py	generate_uems.py	Merge dev into master	Jan 5, 2023
generate_uris.py	generate_uris.py	Merge dev into master	Jan 5, 2023
setup.sh	setup.sh	fix sh scripts	Jan 6, 2023

README.md

AISHELL-4 for Pyannote

These scripts automatically download the AISHELL-4 dataset and set it up to be used with pyannote-database.

It will generate two subsets from the original train set : custom_train and custom_dev, as the original dataset only has training and test data. Defaults are 12h for custom_dev, and what's left (~92h) for custom_train.

Out-of-the-box protocol for pyannote.audio training is AISHELL.SpeakerDiarization.Custom.

Instruction

Run setup.sh to download and extract the files.

Original sets info

subset	# files	total length
train	191	104h46m
test	20	12h34m

Credits

AISHELL-4 (CC BY-SA 4.0) :
- Dataset: https://www.openslr.org/111/
- Original website : http://www.aishelltech.com/aishell_4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

aishell4

aishell4

README.md

AISHELL-4 for Pyannote

Instruction

Original sets info

Credits

Files

aishell4

Directory actions

More options

Directory actions

More options

Latest commit

History

aishell4

Folders and files

parent directory

README.md

AISHELL-4 for Pyannote

Instruction

Original sets info

Credits