Skip to content

shkim816/acnn_speaker_recog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ACNN for Text-Independent Speaker Recognition

Official implementation of

  • Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition
    by Seong-Hu Kim, Yong-Hwa Park @ Human Lab, Mechanical Engineering Department, KAIST
    Interspeech

Accepted paper in InterSpeech 2021.

This code was written mainly with reference to baseline code.

Adaptive Convolutional Neural Network Module

We use two scaling maps, which are frequency and time domain, to each axis for the adaptive kernel in the ACNN module. The adaptive kernel is created by element-wise multiplication of each output channel of the content-invariant kernel with the scaling matrix. The structure of proposed ACNN module for speaker recognition is shown as follows.

This module is applied to VGG-M and ResNet for text-independent speaker recognition.

Requirements and versions used

  • pytorch >= 1.4.0
  • pytorchaudio >= 0.4.0
  • numpy >= 1.18

Dataset

We used Voxceleb1 dataset in this paper. You can download the dataset by reffering to Voxceleb. All data should be gathered in one folder and you set the dataset directories in 'train_model.yaml'.

Training

You can train and save model in exps folder by running:

python train_model.py

You need to adjust the training parameters in yaml before training.

Results:

Network Top-1 (%) Top-1 (%) EER (%) C_det (%)
Adaptive VGG-M (N=18) 86.51 95.31 5.68 0.510
Adaptive ResNet18 (N=18) 85.84 95.29 6.18 0.589

Pretrained models

There are pretrained models in 'pretrained_model'. The example code for verification using the pretrained models is not provided separately.

Citation

@inproceedings{kim21_interspeech,
  author={Seong-Hu Kim and Yong-Hwa Park},
  title={{Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={66--70},
  doi={10.21437/Interspeech.2021-65}
}

Please contact Seong-Hu Kim at [email protected] for any query.

About

acnn for text-independent speaker recognition

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages