Skip to content

Commit b9890a9

Browse files
authored
[src,script,egs] the gop_speechocean762 recipe (kaldi-asr#4441)
1 parent 6359c90 commit b9890a9

28 files changed

+1223
-234
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,11 +48,15 @@ GSYMS
4848
# Python compiled bytecode files.
4949
*.pyc
5050

51+
# Python virtual environment
52+
venv/
53+
5154
# Make dependencies.
5255
.depend.mk
5356

5457
# Some weird thing that macOS creates.
5558
*.dSYM
59+
.DS_Store
5660

5761
# Windows executable, symbol and some weird files.
5862
*.exe
@@ -61,6 +65,7 @@ GSYMS
6165
*.manifest
6266
/kaldiwin_vs*
6367
.vscode
68+
.idea
6469

6570
# /src/
6671
/src/.short_version

egs/gop/s5/local/make_testcase.sh

Lines changed: 0 additions & 12 deletions
This file was deleted.

egs/gop/s5/run.sh

Lines changed: 0 additions & 102 deletions
This file was deleted.
Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,5 +94,20 @@ We guess the HMM topo of chain model may not fit for GOP.
9494

9595
The nnet3's TDNN (no chain) model performs well in GOP computing, so this recipe uses it.
9696

97+
## The `speechocean762` corpus
98+
99+
This corpus aims to provide a free public dataset for the pronunciation scoring task.
100+
101+
This corpus consists 5000 English sentences.
102+
All the speakers are non-native and their mother tongue is Mandarin.
103+
Half of the speakers are Children and the others are adults.
104+
The information of age and gender are provided.
105+
106+
The scores was made by five experts. To avoid subjectively bias, each experts scores independently under the same metric.
107+
The experts score at three levels: phoneme-level, word-level and sentence-level.
108+
109+
In this recipe, the automatic phoneme-level scoring is illustrated.
110+
97111
## Acknowledgement
98-
The author of this recipe would like to thank Xingyu Na for his works of model tuning and his helpful suggestions.
112+
The author of this recipe would like to thank Speechocean for providing the corpus,
113+
and Xingyu Na for his works of model tuning and his helpful suggestions.

egs/gop_speechocean762/s5/RESULT

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
In the `speechocean762` corpus, the phoneme-level scores are in three levels:
2+
2: pronunciation is correct
3+
1: pronunciation is right but has a heavy accent
4+
0: pronunciation is incorrect or missed
5+
6+
Firstly, we can treat the scoring as a regression task.
7+
So, MSE(Mean Square Error) and Corr(Cross-correlation) are computed:
8+
9+
MSE: 0.15
10+
Corr: 0.42
11+
12+
Then we round the continuous predicted scores into [0, 1, 2] to treat the scoring
13+
as a classification task.
14+
So, the classification metrics like precision, recall, and f1-score are computed
15+
and printed by `sklearn.metrics.classification_report`:
16+
17+
18+
precision recall f1-score support
19+
20+
0 0.46 0.17 0.25 1339
21+
1 0.16 0.37 0.22 1828
22+
2 0.96 0.93 0.95 44079
23+
24+
accuracy 0.89 47246
25+
macro avg 0.53 0.49 0.47 47246
26+
weighted avg 0.92 0.89 0.90 47246
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# config for high-resolution MFCC features, intended for neural network training
2+
# Note: we keep all cepstra, so it has the same info as filterbank features,
3+
# but MFCC is more easily compressible (because less correlated) which is why
4+
# we prefer this method.
5+
--use-energy=false # use average of log energy, not energy.
6+
--num-mel-bins=40 # similar to Google's setup.
7+
--num-ceps=40 # there is no dimensionality reduction.
8+
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so
9+
# there might be some information at the low end.
10+
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#!/usr/bin/env bash
2+
3+
# Copyright 2015 Johns Hopkins University (Author: Jan Trmal <[email protected]>)
4+
# 2021 Xiaomi Corporation (Author: Junbo Zhang)
5+
# Apache 2.0
6+
7+
[ -f ./path.sh ] && . ./path.sh
8+
9+
command -v python3 >&/dev/null \
10+
|| { echo >&2 "python3 not found on PATH. You will have to install Python3, preferably >= 3.6"; exit 1; }
11+
12+
for package in kaldi_io sklearn imblearn; do
13+
python3 -c "import ${package}" 2> /dev/null
14+
if [ $? -ne 0 ] ; then
15+
echo >&2 "This recipe needs the package ${package} installed. Exit."
16+
exit 1
17+
fi
18+
done
19+
20+
exit 0
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#!/usr/bin/env bash
2+
3+
# Copyright 2020-2021 Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang)
4+
# Apache 2.0
5+
6+
if [ "$#" -ne 2 ]; then
7+
echo "Usage: $0 <src-dir> <dst-dir>"
8+
echo "e.g.: $0 /home/storage07/zhangjunbo/data/speechocean762/test data/test"
9+
exit 1
10+
fi
11+
12+
src=$1
13+
dst=$2
14+
15+
[ ! -d $src ] && echo "$0: no such directory $src" && exit 1;
16+
[ ! -d $src/../WAVE ] && echo "$0: no wav directory" && exit 1;
17+
18+
wavedir=`realpath $src/../WAVE`
19+
20+
[ -d $dst ] || mkdir -p $dst || exit 1;
21+
22+
cp -Rf $src/* $dst/ || exit 1;
23+
24+
sed -i.ori "s#WAVE#${wavedir}#" $dst/wav.scp || exit 1
25+
26+
utils/validate_data_dir.sh --no-feats $dst || exit 1;
27+
28+
echo "$0: successfully prepared data in $dst"
29+
30+
exit 0
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
#!/usr/bin/env bash
2+
3+
# Copyright 2014 Johns Hopkins University (author: Daniel Povey)
4+
# 2020-2021 Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang)
5+
# Apache 2.0
6+
7+
set -e
8+
9+
remove_archive=false
10+
if [ "$1" == --remove-archive ]; then
11+
remove_archive=true
12+
shift
13+
fi
14+
15+
if [ $# -ne 2 ]; then
16+
echo "Usage: $0 [--remove-archive] <url-base> <data-base>"
17+
echo "e.g.: $0 www.openslr.org/resources/101 /home/storage07/zhangjunbo/data"
18+
echo "With --remove-archive it will remove the archive after successfully un-tarring it."
19+
exit 1
20+
fi
21+
22+
url=$1
23+
data=$2
24+
[ -d $data ] || mkdir -p $data
25+
26+
corpus_name=speechocean762
27+
28+
if [ -z "$url" ]; then
29+
echo "$0: empty URL base."
30+
exit 1;
31+
fi
32+
33+
if [ -f $data/$corpus_name/.complete ]; then
34+
echo "$0: data part $corpus_name was already successfully extracted, nothing to do."
35+
exit 0;
36+
fi
37+
38+
# Check the archive file in bytes
39+
ref_size=520810923
40+
if [ -f $data/$corpus_name.tar.gz ]; then
41+
size=$(/bin/ls -l $data/$corpus_name.tar.gz | awk '{print $5}')
42+
if [ $ref_size != $size ]; then
43+
echo "$0: removing existing file $data/$corpus_name.tar.gz because its size in bytes $size"
44+
echo "does not equal the size of one of the archives."
45+
rm $data/$corpus_name.tar.gz
46+
else
47+
echo "$data/$corpus_name.tar.gz exists and appears to be complete."
48+
fi
49+
fi
50+
51+
# If you have permission to access Xiaomi's server, you would not need to
52+
# download it from OpenSLR
53+
path_on_mi_server=/home/storage06/wangyongqing/share/data/$corpus_name.tar.gz
54+
if [ -f $path_on_mi_server ]; then
55+
cp $path_on_mi_server $data/$corpus_name.tar.gz
56+
fi
57+
58+
if [ ! -f $data/$corpus_name.tar.gz ]; then
59+
if ! which wget >/dev/null; then
60+
echo "$0: wget is not installed."
61+
exit 1;
62+
fi
63+
full_url=$url/$corpus_name.tar.gz
64+
65+
echo "$0: downloading data from $full_url. This may take some time, please be patient."
66+
if ! wget -c --no-check-certificate $full_url -O $data/$corpus_name.tar.gz; then
67+
echo "$0: error executing wget $full_url"
68+
exit 1;
69+
fi
70+
fi
71+
72+
cd $data
73+
if ! tar -xvzf $corpus_name.tar.gz; then
74+
echo "$0: error un-tarring archive $data/$corpus_name.tar.gz"
75+
exit 1;
76+
fi
77+
78+
touch $corpus_name/.complete
79+
cd -
80+
81+
echo "$0: Successfully downloaded and un-tarred $data/$corpus_name.tar.gz"
82+
83+
if $remove_archive; then
84+
echo "$0: removing $data/$corpus_name.tar.gz file since --remove-archive option was supplied."
85+
rm $data/$corpus_name.tar.gz
86+
fi

0 commit comments

Comments
 (0)