🎧 Welcome to AudioCodecBench 🎵

AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation

AudioCodecBench allows for a comprehensive assessment of codecs' capabilities which evaluate across four dimensions: audio reconstruction metric, codebook index (ID) stability, decoder-only transformer perplexity, and performance on downstream probe tasks. Our results show the correctness of the provided suitable definitions and the correlation among reconstruction metrics, codebook ID stability, downstream probe tasks and perplexity.
arXiv Paper: AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation

Purpose

how to evaluate the quality of codebook (for lm modeling)
collect all existing metrics for reconstruction
collect all existing metrics for Linear Probing (Music and Speech)

Env Build

The following explains how to quickly create the required environment and install codec_evaluation for use.

Setup environment and dependencies

We strongly recommended to use conda for managing your Python environment.

Create a virtual environment using conda.

 # create a virtual environment using conda
 conda create -n codec_eval python==3.10 -y	# Python ==3.10 is recommended.
 conda activate codec_eval

Install codec_evaluation from source

 git clone https://github.com/wuzhiyue111/Codec-Evaluation.git
 cd Codec-Evaluation
 bash env_build.sh

Usage

The following will introduce how to conduct evaluations using codecs and downstream tasks. For details, please refer to the instruction document. [EN][ZH]

Dataset Download

Dataset download address: AudioCodecBench-Dataset

Probe

Probe task results

Reconstruction Metric

Speech

Codec	Metrics
Codec	PESQ	Speaker_Sim	WER_GT	WER_REC	CER_GT	CER_REC	STOI	VISQOL	Mel distance
DAC	3.69	0.965	0.155	0.202	0.09	0.125	0.94	4.51	0.21
Encodec	3.21	0.919	0.155	0.198	0.09	0.114	0.93	4.37	0.31
Mimi	2.77	0.928	0.155	0.287	0.09	0.173	0.88	3.84	0.38
SemantiCodec	2.64	0.907	0.155	0.318	0.09	0.195	0.86	4.04	0.32
WavTokenizer	2.17	0.743	0.155	0.494	0.09	0.325	0.83	3.43	0.68
SpeechTokenizer	2.97	0.924	0.155	0.216	0.09	0.120	0.89	4.22	0.25
XCodec	3.23	0.942	0.155	0.185	0.09	0.106	0.91	4.34	0.24
YuE	3.17	0.938	0.155	0.195	0.09	0.113	0.90	4.33	0.25

Music

Codec	Metrics
	PESQ	STOI	VISQOL	Mel distance
	DAC	2.66	0.86	4.40	0.73
Encodec	2.27	0.85	4.25	0.78
SemantiCodec	1.32	0.60	4.19	0.98
WavTokenizer	1.14	0.49	3.84	1.15
XCodec	1.85	0.76	4.35	0.91
YuE	1.84	0.75	4.35	0.90

Probe Experiment

Music Probe

Codec	Mode	Dataset
		emomusic		GTZAN	MTT		NSynthI	NSynthP	VocalSetSinger	VocalSetTech	GS	MTGGenre		MTGInstrument		MTGMoodtheme		MTGTop50
		A	V	Acc	AP	AUCROC	Acc	Acc	Acc	Acc	Acc	AP	AUCROC	AP	AUCROC	AP	AUCROC	AP	AUCROC
DAC	quantized_emb	0.470	0.064	0.575	0.203	0.785	0.602	0.468	0.419	0.376	0.088	0.0295	0.530	0.108	0.638	0.076	0.651	0.141	0.687
Encodec	quantized_emb	0.467	0.066	0.570	0.184	0.759	0.537	0.547	0.299	0.301	0.102	0.035	0.528	0.104	0.620	0.057	0.642	0.137	0.701
SemantiCodec	quantized_emb	0.507	0.316	0.703	0.318	0.877	0.658	0.764	0.344	0.451	0.343	0.035	0.526	0.149	0.720	0.099	0.723	0.230	0.795
WavTokenizer	quantized_emb	0.455	0.066	0.423	0.168	0.739	0.537	0.444	0.130	0.287	0.093	0.034	0.530	0.107	0.635	0.056	0.627	0.137	0.698
Xcodec	quantized_emb	0.553	0.143	0.664	0.323	0.873	0.640	0.905	0.537	0.570	0.455	0.034	0.519	0.164	0.707	0.101	0.710	0.216	0.777
YuE	quantized_emb	0.573	0.156	0.669	0.315	0.870	0.622	0.896	0.523	0.594	0.454	0.034	0.517	0.133	0.700	0.102	0.711	0.191	0.758

Music Probe codebook0

Codec	Dataset
	emomusic		GTZAN	MTT		NSynthI	VocalSetSinger	VocalSetTech	GS	MTGInstrument		MTGTop50
	A	V	Acc	AP	AUCROC	Acc	Acc	Acc	Acc	AP	AUCROC	AP	AUCROC
DAC	0.354	0.000	0.600	0.175	0.741	0.563	0.226	0.315	0.088	0.117	0.638	0.135	0.690
Encodec	0.465	0.092	0.543	0.119	0.681	0.563	0.086	0.268	0.088	0.110	0.630	0.136	0.701
SemantiCodec	0.456	0.267	0.629	0.227	0.825	0.625	0.134	0.477	0.229	0.150	0.724	0.224	0.793
Xcodec	0.375	0.461	0.628	0.261	0.838	0.611	0.320	0.488	0.389	0.140	0.669	0.191	0.755
YuE	0.439	0.085	0.616	0.249	0.831	0.623	0.335	0.475	0.346	0.133	0.670	0.191	0.758

Speech and Sound Probe

Codec	Mode	Dataset
		Common_Voice		Vocalsound	MELD	ESC50
		WER	CER	Acc	Acc	Acc
DAC	quantized_emb	0.526	0.229	0.535	0.483	0.325
Encodec	quantized_emb	0.503	0.209	0.574	0.481	0.275
SemantiCodec	quantized_emb	0.490	0.200	0.723	0.482	0.620
WavTokenizer	quantized_emb	0.582	0.288	0.524	0.484	0.135
Mimi	quantized_emb	0.442	0.168	0.833	0.481	0.335
SpeechTokenizer	quantized_emb	0.469	0.190	0.776	0.498	0.670
Xcodec	quantized_emb	0.474	0.188	0.731	0.491	0.640
YuE	quantized_emb	0.472	0.187	0.782	0.515	0.640
hubert	unquantized_emb	-	-	0.877	0.495	0.525
qwen2audioencoder	unquantized_emb	-	-	0.953	0.590	0.975

Speech and Sound Probe codebook0

Codec	Dataset
	Vocalsound	MELD	ESC50
	Acc	Acc	Acc
DAC	0.511	0.481	0.285
Encodec	0.479	0.481	0.230
SemantiCodec	0.646	0.482	0.465
Mimi	0.794	0.481	0.265
SpeechTokenizer	0.698	0.489	0.420
Xcodec	0.656	0.487	0.525
YuE	0.684	0.481	0.515

PPL Experiment

LibriTTS

Codec	ppl↓	cb1_ppl	cb2_ppl	cb3_ppl	cb4_ppl	cb5_ppl	cb6_ppl	cb7_ppl	cb8_ppl
DAC	420.6	48.9	284.1	428.6	560.2	609.7	728.1	814.0	835.5
Encodec	111.4	28.0	59.1	93.7	130.5	153.7	183.3	202.0	213.5
WavTokenizer	317.1	317.1	-	-	-	-	-	-	-
X-Codec	56.2	20.6	24.9	37.8	57.3	77.3	92.0	103.0	126.3
YuE	52.7	18.3	29.6	37.7	52.6	74.3	89.8	95.3	90.3
SpeechTokenizer	24.2	2.4	12.4	24.8	33.6	40.9	46.0	50.3	52.8
Mimi	269.6	32.9	189.3	334.7	383.3	424.2	431.7	456.9	459.7
SemamiCodec	14.8	1.2	191.0	-	-	-	-	-	-

Emilia_EN(100ksteps)

Codec	ppl↓	cb1_ppl	cb2_ppl	cb3_ppl	cb4_ppl	cb5_ppl	cb6_ppl	cb7_ppl	cb8_ppl
DAC	247	20.6	146.7	218	315.1	395.9	482.9	569.6	628.2
Encodec	75.7	14.8	33.4	59	88.7	111.3	138.4	158.5	172.6
WavTokenizer	104.7	104.7	-	-	-	-	-	-	-
X-Codec	30.3	10.0	12.7	20.2	30.6	41.9	50.7	61.6	71.4
YuE	29.0	9.3	16.0	19.9	29.3	38.7	51.0	55.2	54.1
SpeechTokenizer	13.5	1.9	5.5	12.1	18.3	22.3	25.1	28.6	30.8
Mimi	126.9	9.1	58.2	148.0	185.0	228.7	256.6	278.9	298.5
SemamiCodec	7.9	1.0	82.1	-	-	-	-	-	-

MTG-Jamendo(100ksteps)

Codec	ppl↓	cb1_ppl	cb2_ppl	cb3_ppl	cb4_ppl	cb5_ppl	cb6_ppl	cb7_ppl	cb8_ppl
DAC	194	28.6	122.8	152.4	212.8	270.7	352.9	413.4	473.5
Encodec	141.3	17.6	62.5	110.7	170	225.9	287	336.8	375.6
WavTokenizer	38.2	38.2	-	-	-	-	-	-	-
X-Codec	47.5	20.4	19.6	32.4	51.1	64.5	74.5	86.8	100.2
YuE	46.2	18.3	28.7	30.4	48.2	60.0	74.9	83.0	76.3
SemamiCodec	15.5	1.0	272.4	-	-	-	-	-	-

Acknowledgement

We would like to extend a special thanks to authors of https://github.com/lucadellalib/audiocodecs and Marble. Their work has been a great source of inspiration for us.

Citation

@misc{wang2025audiocodecbenchcomprehensivebenchmarkaudio,
      title={AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation}, 
      author={Lu Wang and Hao Chen and Siyu Wu and Zhiyue Wu and Hao Zhou and Chengfeng Zhang and Ting Wang and Haodi Zhang},
      year={2025},
      eprint={2509.02349},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2509.02349}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 535 Commits
codec_evaluation		codec_evaluation
doc		doc
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
env_build.sh		env_build.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 Welcome to AudioCodecBench 🎵

AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation

Purpose

Env Build

Setup environment and dependencies

Create a virtual environment using conda.

Install codec_evaluation from source

Usage

Dataset Download

Probe

Probe task results

Reconstruction Metric

Speech

Music

Probe Experiment

Music Probe

Music Probe codebook0

Speech and Sound Probe

Speech and Sound Probe codebook0

PPL Experiment

LibriTTS

Emilia_EN(100ksteps)

MTG-Jamendo(100ksteps)

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎧 Welcome to AudioCodecBench 🎵

AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation

Purpose

Env Build

Setup environment and dependencies

Create a virtual environment using conda.

Install codec_evaluation from source

Usage

Dataset Download

Probe

Probe task results

Reconstruction Metric

Speech

Music

Probe Experiment

Music Probe

Music Probe codebook0

Speech and Sound Probe

Speech and Sound Probe codebook0

PPL Experiment

LibriTTS

Emilia_EN(100ksteps)

MTG-Jamendo(100ksteps)

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages