Codecs_reconstruction_eval

Codecs_reconstruction_eval is an evaluation toolkit that wraps a broad collection of codec-reconstruction APIs under a single interface, letting you decode audio with one call and instantly compute an extensive set of objective metrics—including

PESQ(NB/WB)
STOI
Speaker-embedding similarity(SIM)
Mel-spectrogram loss
Word-error rate (WER) on LibriSpeech-test-clean
UMOS
Usage and entropy

Ready-to-run scripts are provided, and you can define additional metrics in the metrics

Supported models include

You can define your own model in wrapper.py; it needs to inherit from the AudioTokenizer class and implement the load_model, get_code, and recon_wav methods.

environment

pip install -r requirements.txt

visqol

google/visqol: Perceptual Quality Estimator for speech and audio github.com

# visqol
bazel-5.3.2-installer-linux-x86_64.sh

git clone https://github.com/google/visqol.git

bazel build :visqol -c opt

The following situations may occur:

ImportError: ~/miniconda3/envs/py310/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by ~/miniconda3/envs/py310/lib/python3.10/site-packages/visqol/visqol_lib_py.s)

Refer to

解决 libstdc++.so.6: version ‘GLIBCXX_3.4.30‘ not found 问题_libstdc++.so.6 not found-CSDN博客

Delete libstdc++.so.6

cd ~/miniconda3/envs/py310/lib
strings libstdc++.so.6 | grep GLIBCXX_3.4.30
strings /usr/lib/x86_64-linux-gnu/libstdc++.so.6 |grep GLIBCXX_3.4.30

export PATH=$PATH:～/bin

How to use

test inference

first,you can run python wrapper.py to get the bitrate or latent dimension of the codec.

prepare audio

Many works evaluate speech tokenizers on LibriSpeech/test-clean and we use this dataset as an example. First, download test-clean.tar.gz from https://www.openslr.org/12 and extract or move its contents to exp_recon/test-clean/.

mkdir exp_recon
mv path/to/LibriSpeech/test-clean exp_recon/test-clean_flac

and then convert the audio into WAV format.

python trans_folder_to_wav.py

We explicitly store the resampled 16 kHz audio for evaluation.

python resample_folder.py

reconstruct audio

During evaluation, each audio clip is then resampled to the codec’s sampling rate for reconstruction, and afterward resampled back to 16 kHz for storage and evaluation. Run recon_folder.sh (or recon_folder_multi.sh if you have multiple GPUs) to reconstruct the audio clips in exp_recon/test-clean using your codec.

You will get a folder structure as shown below:

exp_recon/
├── DAC_24k_9         # Reconstructed audio using DAC (24kHz) model with 9 RVQ codebooks
├── test-clean        # Original audio clips (original sampling rate)
└── test-clean_16000  # Resampled original audio clips at 16 kHz
└── test-clean_flac

run eval

For pairwise metrics such as PESQ, STOI, and mel distance—where two audio folders must be compared—both folders should have the same sampling rate.

bash run_pesq_stoi.sh
bash run_mel_stft.sh

for other metric,run

bash run_usage.sh
bash run_entropy.sh
bash run_wer.sh
bash run_umos.sh 
bash run_spk.sh

Acknowledgements

This toolkit reuses code cloned directly from the following projects to simplify setup:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
BigCodec		BigCodec
BigVGAN		BigVGAN
Encodec_uniaudio		Encodec_uniaudio
FACodec		FACodec
WavTokenizer		WavTokenizer
exp_recon		exp_recon
metrics		metrics
test_module		test_module
xcodec		xcodec
.gitignore		.gitignore
LJ001-0001.wav		LJ001-0001.wav
README.md		README.md
combined.txt		combined.txt
gather_all_audio.py		gather_all_audio.py
get_codec_latent.py		get_codec_latent.py
recon_folder.py		recon_folder.py
recon_folder.sh		recon_folder.sh
recon_folder_multi.py		recon_folder_multi.py
recon_folder_multi.sh		recon_folder_multi.sh
resample_folder.py		resample_folder.py
run_entropy.sh		run_entropy.sh
run_mel_stft.sh		run_mel_stft.sh
run_pesq_stoi.sh		run_pesq_stoi.sh
run_spk.sh		run_spk.sh
run_umos.sh		run_umos.sh
run_usage.sh		run_usage.sh
run_wer.sh		run_wer.sh
trans_folder_to_wav.py		trans_folder_to_wav.py
wrapper.py		wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codecs_reconstruction_eval

environment

visqol

How to use

test inference

prepare audio

reconstruct audio

run eval

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Codecs_reconstruction_eval

environment

visqol

How to use

test inference

prepare audio

reconstruct audio

run eval

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages