vimalmanohar
diff --git a/‎.gitignore‎
Lines changed: 5 additions & 0 deletions b/‎.gitignore‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎egs/gop/s5/local/make_testcase.sh‎
Lines changed: 0 additions & 12 deletions b/‎egs/gop/s5/local/make_testcase.sh‎
Lines changed: 0 additions & 12 deletions
diff --git a/‎egs/gop/s5/run.sh‎
Lines changed: 0 additions & 102 deletions b/‎egs/gop/s5/run.sh‎
Lines changed: 0 additions & 102 deletions
diff --git a/‎egs/gop/README.md‎ ‎egs/gop_speechocean762/README.md‎egs/gop/README.md renamed to egs/gop_speechocean762/README.md
Lines changed: 16 additions & 1 deletion b/‎egs/gop/README.md‎ ‎egs/gop_speechocean762/README.md‎egs/gop/README.md renamed to egs/gop_speechocean762/README.md
Lines changed: 16 additions & 1 deletion
diff --git a/‎egs/gop_speechocean762/s5/RESULT‎
Lines changed: 26 additions & 0 deletions b/‎egs/gop_speechocean762/s5/RESULT‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎egs/gop/s5/cmd.sh‎ ‎egs/gop_speechocean762/s5/cmd.sh‎egs/gop/s5/cmd.sh renamed to egs/gop_speechocean762/s5/cmd.sh b/‎egs/gop/s5/cmd.sh‎ ‎egs/gop_speechocean762/s5/cmd.sh‎egs/gop/s5/cmd.sh renamed to egs/gop_speechocean762/s5/cmd.sh
diff --git a/‎egs/gop_speechocean762/s5/conf/mfcc_hires.conf‎
Lines changed: 10 additions & 0 deletions b/‎egs/gop_speechocean762/s5/conf/mfcc_hires.conf‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎egs/gop_speechocean762/s5/local/check_dependencies.sh‎
Lines changed: 20 additions & 0 deletions b/‎egs/gop_speechocean762/s5/local/check_dependencies.sh‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎egs/gop_speechocean762/s5/local/data_prep.sh‎
Lines changed: 30 additions & 0 deletions b/‎egs/gop_speechocean762/s5/local/data_prep.sh‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎egs/gop_speechocean762/s5/local/download_and_untar.sh‎
Lines changed: 86 additions & 0 deletions b/‎egs/gop_speechocean762/s5/local/download_and_untar.sh‎
Lines changed: 86 additions & 0 deletions
@@ -48,11 +48,15 @@ GSYMS
 # Python compiled bytecode files.
 *.pyc
 
+# Python virtual environment
+venv/
+
 # Make dependencies.
 .depend.mk
 
 # Some weird thing that macOS creates.
 *.dSYM
+.DS_Store
 
 # Windows executable, symbol and some weird files.
 *.exe
@@ -61,6 +65,7 @@ GSYMS
 *.manifest
 /kaldiwin_vs*
 .vscode
+.idea
 
 # /src/
 /src/.short_version
 
@@ -94,5 +94,20 @@ We guess the HMM topo of chain model may not fit for GOP.
 
 The nnet3's TDNN (no chain) model performs well in GOP computing, so this recipe uses it.
 
+## The `speechocean762` corpus
+
+This corpus aims to provide a free public dataset for the pronunciation scoring task.
+
+This corpus consists 5000 English sentences.
+All the speakers are non-native and their mother tongue is Mandarin.
+Half of the speakers are Children and the others are adults.
+The information of age and gender are provided.
+
+The scores was made by five experts. To avoid subjectively bias, each experts scores independently under the same metric.
+The experts score at three levels: phoneme-level, word-level and sentence-level.
+
+In this recipe, the automatic phoneme-level scoring is illustrated.
+
 ## Acknowledgement
-The author of this recipe would like to thank Xingyu Na for his works of model tuning and his helpful suggestions.
+The author of this recipe would like to thank Speechocean for providing the corpus,
+and Xingyu Na for his works of model tuning and his helpful suggestions.
@@ -0,0 +1,26 @@
+In the `speechocean762` corpus, the phoneme-level scores are in three levels:
+2: pronunciation is correct
+1: pronunciation is right but has a heavy accent
+0: pronunciation is incorrect or missed
+
+Firstly, we can treat the scoring as a regression task.
+So, MSE(Mean Square Error) and Corr(Cross-correlation) are computed:
+
+MSE: 0.15
+Corr: 0.42
+
+Then we round the continuous predicted scores into [0, 1, 2] to treat the scoring
+as a classification task.
+So, the classification metrics like precision, recall, and f1-score are computed
+and printed by `sklearn.metrics.classification_report`:
+
+
+              precision    recall  f1-score   support
+
+           0       0.46      0.17      0.25      1339
+           1       0.16      0.37      0.22      1828
+           2       0.96      0.93      0.95     44079
+
+    accuracy                           0.89     47246
+   macro avg       0.53      0.49      0.47     47246
+weighted avg       0.92      0.89      0.90     47246
@@ -0,0 +1,10 @@
+# config for high-resolution MFCC features, intended for neural network training
+# Note: we keep all cepstra, so it has the same info as filterbank features,
+# but MFCC is more easily compressible (because less correlated) which is why 
+# we prefer this method.
+--use-energy=false   # use average of log energy, not energy.
+--num-mel-bins=40     # similar to Google's setup.
+--num-ceps=40     # there is no dimensionality reduction.
+--low-freq=20     # low cutoff frequency for mel bins... this is high-bandwidth data, so
+                  # there might be some information at the low end.
+--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600) 
@@ -0,0 +1,20 @@
+#!/usr/bin/env bash
+
+# Copyright 2015  Johns Hopkins University (Author: Jan Trmal <[email protected]>)
+#           2021  Xiaomi Corporation (Author: Junbo Zhang)
+# Apache 2.0
+
+[ -f ./path.sh ] && . ./path.sh
+
+command -v python3 >&/dev/null \
+  || { echo  >&2 "python3 not found on PATH. You will have to install Python3, preferably >= 3.6"; exit 1; }
+
+for package in kaldi_io sklearn imblearn; do
+  python3 -c "import ${package}" 2> /dev/null
+  if [ $? -ne 0 ] ; then
+    echo >&2 "This recipe needs the package ${package} installed. Exit."
+    exit 1
+  fi
+done
+
+exit  0
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+
+# Copyright 2020-2021  Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang)
+# Apache 2.0
+
+if [ "$#" -ne 2 ]; then
+  echo "Usage: $0 <src-dir> <dst-dir>"
+  echo "e.g.: $0 /home/storage07/zhangjunbo/data/speechocean762/test data/test"
+  exit 1
+fi
+
+src=$1
+dst=$2
+
+[ ! -d $src ] && echo "$0: no such directory $src" && exit 1;
+[ ! -d $src/../WAVE ] && echo "$0: no wav directory" && exit 1;
+
+wavedir=`realpath $src/../WAVE`
+
+[ -d $dst ] || mkdir -p $dst || exit 1;
+
+cp -Rf $src/* $dst/ || exit 1;
+
+sed -i.ori "s#WAVE#${wavedir}#" $dst/wav.scp || exit 1
+
+utils/validate_data_dir.sh --no-feats $dst || exit 1;
+
+echo "$0: successfully prepared data in $dst"
+
+exit 0
@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+
+# Copyright      2014  Johns Hopkins University (author: Daniel Povey)
+#           2020-2021  Xiaomi Corporation (Author: Junbo Zhang, Yongqing Wang)
+# Apache 2.0
+
+set -e
+
+remove_archive=false
+if [ "$1" == --remove-archive ]; then
+  remove_archive=true
+  shift
+fi
+
+if [ $# -ne 2 ]; then
+  echo "Usage: $0 [--remove-archive] <url-base> <data-base>"
+  echo "e.g.: $0 www.openslr.org/resources/101 /home/storage07/zhangjunbo/data"
+  echo "With --remove-archive it will remove the archive after successfully un-tarring it."
+  exit 1
+fi
+
+url=$1
+data=$2
+[ -d $data ] || mkdir -p $data
+
+corpus_name=speechocean762
+
+if [ -z "$url" ]; then
+  echo "$0: empty URL base."
+  exit 1;
+fi
+
+if [ -f $data/$corpus_name/.complete ]; then
+  echo "$0: data part $corpus_name was already successfully extracted, nothing to do."
+  exit 0;
+fi
+
+# Check the archive file in bytes
+ref_size=520810923
+if [ -f $data/$corpus_name.tar.gz ]; then
+  size=$(/bin/ls -l $data/$corpus_name.tar.gz | awk '{print $5}')
+  if [ $ref_size != $size ]; then
+    echo "$0: removing existing file $data/$corpus_name.tar.gz because its size in bytes $size"
+    echo "does not equal the size of one of the archives."
+    rm $data/$corpus_name.tar.gz
+  else
+    echo "$data/$corpus_name.tar.gz exists and appears to be complete."
+  fi
+fi
+
+# If you have permission to access Xiaomi's server, you would not need to
+# download it from OpenSLR
+path_on_mi_server=/home/storage06/wangyongqing/share/data/$corpus_name.tar.gz
+if [ -f $path_on_mi_server ]; then
+  cp $path_on_mi_server $data/$corpus_name.tar.gz
+fi
+
+if [ ! -f $data/$corpus_name.tar.gz ]; then
+  if ! which wget >/dev/null; then
+    echo "$0: wget is not installed."
+    exit 1;
+  fi
+  full_url=$url/$corpus_name.tar.gz
+
+  echo "$0: downloading data from $full_url.  This may take some time, please be patient."
+  if ! wget -c --no-check-certificate $full_url -O $data/$corpus_name.tar.gz; then
+    echo "$0: error executing wget $full_url"
+    exit 1;
+  fi
+fi
+
+cd $data
+if ! tar -xvzf $corpus_name.tar.gz; then
+  echo "$0: error un-tarring archive $data/$corpus_name.tar.gz"
+  exit 1;
+fi
+
+touch $corpus_name/.complete
+cd -
+
+echo "$0: Successfully downloaded and un-tarred $data/$corpus_name.tar.gz"
+
+if $remove_archive; then
+  echo "$0: removing $data/$corpus_name.tar.gz file since --remove-archive option was supplied."
+  rm $data/$corpus_name.tar.gz
+fi