Indonesia Words Audio Recognition

Predict indonesia word based on single speech audio. Built using python speech libraries (librosa, noisereduce, and scipy), MFCC (Mel Frequency Cepstral Coefficient), and Random forest model with hyperparameter tuning.

Dataset Information

The dataset used in this repository is an audio dataset and retrieved from kaggle link below.

Dataset kaggle link

Explanatory Data Analysis

This dataset contains seven classes with approximately 210-213 audio wav file for each class. The classes are single word speech that is pronounced in the audio. The words are BEGAL, KEBAKARAN, KECELAKAAN, MALING, PENCURI, RAMPOK, TABRAKAN. Below is the visualization for audio distribution between each classes.

Each class contains audio file in wav format. Below are the example audios of the dataset.

The dataset also has two types of ambience noise, rain and road ambience. Moreover, the audio also has silent at the start and end of the audio which makes the audio is not clean to be processed further. so, preprocessing is needed to clean the audio from noise and trim the audio silent part.

Preprocessing

The preprocessing technique that used in this dataset are noise reduction using noisereduce library and silent trim using librosa. Below are the example result of the audio after being preprocessed.

Feature Extraction (MFCC)

MFCC is a feature extraction technique that represents the audio frequency in a human way of hearing. The major reason why we use MFCC is because interpretation of each frequency is different in human hearing. MFCC consists of 6 steps.

Pre-emphasis
Audio framing
Audio windowing
Mel filterbank
Audio Log
Discrete Cosine Transform

Data Splitting and PCA

Data splitting used to split the dataset into 2 parts, training set and testing set with ratio 8:2. Then, because the data has high dimentionality, PCA (principal component analysis were used to reduce the difficulty for the model to understand the data.

Modelling and Evaluation

The model that used are random forest classifier and the evaluatioon that used are basic evaluation such as accuracy, f1-score, recall, and precision. Overall the model gains accuracy >= 80%.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
app		app
model		model
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Indonesia Words Audio Recognition

Dataset Information

Explanatory Data Analysis

Preprocessing

Feature Extraction (MFCC)

Data Splitting and PCA

Modelling and Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

axelkrnwn/indo-speech-classification

Folders and files

Latest commit

History

Repository files navigation

Indonesia Words Audio Recognition

Dataset Information

Explanatory Data Analysis

Preprocessing

Feature Extraction (MFCC)

Data Splitting and PCA

Modelling and Evaluation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages