Speaker Recognition Engine

The Speaker Recognition Engine is a command-line tool for managing speaker audio data. It supports the following functionalities:

Enrolling new speakers
Recognizing speakers from audio samples
Listing all enrolled speakers
Deleting speaker records

The engine leverages machine learning techniques, specifically Gaussian Mixture Models (GMM), to perform accurate and robust speaker identification.

Installation

Clone the Repository

git clone [email protected]:GenaNiv/voice-recognition-engine.git
cd voice-recognition-engine

Install Dependencies
```
pip install -r requirements.txt
```
Verify the Installation
```
python cli.py --help
```

Usage

The Speaker Recognition Engine supports several commands for managing speaker audio data. Below are the available commands:

Enroll a Speaker: Enroll a new speaker using an audio file.
Recognize a Speaker: Identify a speaker from a given audio file.
List Enrolled Speakers: Display a list of all enrolled speakers.
Delete a Speaker: Remove a speaker's data from the system.

Each command can be executed from the command line with the appropriate arguments. The general syntax for using the tool is:

python cli.py <command> [arguments]

Enroll a Speaker

To enroll a new speaker, use the enroll command followed by the speaker's name and the path to the audio file. Optionally, you can specify parameters like sample rate, number of filters, and number of MFCC coefficients.

Syntax:

python cli.py enroll <speaker_name> <audio_file_path> [optional parameters]

Optional Parameters:

--sample_rate: Sampling rate of the audio file (default: 16000)
--num_filters: Number of Mel filters (default: 26)
--num_ceps: Number of MFCC coefficients (default: 13)
--n_fft: FFT size for audio processing (default: 512)
--frame_size: Frame size in seconds (default: 0.025)
--frame_step: Frame step (overlap) in seconds (default: 0.01)
--n_mixtures: Number of Gaussian mixtures in GMM (default: 8)

Example:

python cli.py enroll gena /home/gena/audio_files/gena.wav --sample_rate 16000 --num_filters 40 --num_ceps 13 --n_fft 512 --frame_size 0.025 --frame_step 0.01 --n_mixtures 8

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
src		src
test		test
GMM_Article.txt		GMM_Article.txt
LICENSE		LICENSE
README.md		README.md
config.json		config.json
requirements.txt		requirements.txt
test_main.py		test_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speaker Recognition Engine

Installation

Usage

Enroll a Speaker

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

GenaNiv/voice-recognition-engine

Folders and files

Latest commit

History

Repository files navigation

Speaker Recognition Engine

Installation

Usage

Enroll a Speaker

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages