Skip to content

axelkrnwn/indo-speech-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Indonesia Words Audio Recognition

Predict indonesia word based on single speech audio. Built using python speech libraries (librosa, noisereduce, and scipy), MFCC (Mel Frequency Cepstral Coefficient), and Random forest model with hyperparameter tuning.

Dataset Information

The dataset used in this repository is an audio dataset and retrieved from kaggle link below.

Dataset kaggle link

Explanatory Data Analysis

This dataset contains seven classes with approximately 210-213 audio wav file for each class. The classes are single word speech that is pronounced in the audio. The words are BEGAL, KEBAKARAN, KECELAKAAN, MALING, PENCURI, RAMPOK, TABRAKAN. Below is the visualization for audio distribution between each classes.

image

Each class contains audio file in wav format. Below are the example audios of the dataset.

image

image

The dataset also has two types of ambience noise, rain and road ambience. Moreover, the audio also has silent at the start and end of the audio which makes the audio is not clean to be processed further. so, preprocessing is needed to clean the audio from noise and trim the audio silent part.

Preprocessing

The preprocessing technique that used in this dataset are noise reduction using noisereduce library and silent trim using librosa. Below are the example result of the audio after being preprocessed.

image

Feature Extraction (MFCC)

MFCC is a feature extraction technique that represents the audio frequency in a human way of hearing. The major reason why we use MFCC is because interpretation of each frequency is different in human hearing. MFCC consists of 6 steps.

  1. Pre-emphasis
  2. Audio framing
  3. Audio windowing
  4. Mel filterbank
  5. Audio Log
  6. Discrete Cosine Transform

Data Splitting and PCA

Data splitting used to split the dataset into 2 parts, training set and testing set with ratio 8:2. Then, because the data has high dimentionality, PCA (principal component analysis were used to reduce the difficulty for the model to understand the data.

Modelling and Evaluation

The model that used are random forest classifier and the evaluatioon that used are basic evaluation such as accuracy, f1-score, recall, and precision. Overall the model gains accuracy >= 80%.

About

Indonesia word speech recognition using MFCC, PCA, and random forest

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published