CelebA 64x64 Benchmark

Basic setup

This benchmark does not aim to achieve the best performance, but to gain insights into the behavior of different quantization methods. Therefore, we use the same basic setup for all the experiments:

The network architecture is SimpleCNN.
Hyperparameters:
- Batch size: 256
- Learning rate: 4e-4
- Optimizer: Adam
- Training steps: 500k
Results are evaluated on CelebA test split which contains 19962 images.

VQVAE

Effect of codebook dimension:

Codebook dim.	Codebook size	Codebook usage↑	PSNR↑	SSIM↑	LPIPS↓	rFID↓
4	512	100.00%	32.2119	0.9517	0.0239	16.3249
8		100.00%	32.2406	0.9520	0.0228	16.6592
16		68.75%	31.6909	0.9473	0.0263	16.4272
32		66.41%	31.7674	0.9480	0.0261	16.3970
64		56.45%	31.5487	0.9453	0.0275	16.8227

Smaller codebook dimension leads to higher codebook usage.

Effect of codebook size:

Codebook dim.	Codebook size	Codebook usage↑	PSNR↑	SSIM↑	LPIPS↓	rFID↓
64	512	56.45%	31.5487	0.9453	0.0275	16.8227
	1024	30.18%	31.3836	0.9459	0.0272	16.4965
	2048	16.06%	31.6631	0.9470	0.0264	16.5808

With low codebook usage, increasing codebook size cannot improve the reconstruction quality.

Effect of l2-norm codes:

Codebook dim.	Codebook size	l2-norm codes	Codebook usage↑	PSNR↑	SSIM↑	LPIPS↓	rFID↓
4	512	No	100.00%	32.2119	0.9517	0.0239	16.3249
4	512	Yes	100.00%	32.2439	0.9473		16.4495
64	512	No	56.45%	31.5487	0.9453	0.0275	16.8227
64	512	Yes	98.24%	31.3334	0.9492	0.0209	12.9127

The l2-normalized codes can improve codebook usage even when codebook dimension is large.

Effect of EMA update:

Codebook dim.	Codebook size	Codebook update	Codebook usage↑	PSNR↑	SSIM↑	LPIPS↓	rFID↓
4	512	VQ loss	100.00%	32.2119	0.9517	0.0239	16.3249
4	512	EMA	100.00%	32.3070	0.9528	0.0224	16.3338
64	512	VQ loss	56.45%	31.5487	0.9453	0.0275	16.8227
64	512	EMA	100.00%	32.0709	0.9516	0.0228	15.5629

Using EMA to update the codebook can improve codebook usage even when codebook dimension is large.

Effect of entropy regularization:

Codebook dim.	Codebook size	Entropy reg. weight	Codebook usage↑	PSNR↑	SSIM↑	LPIPS↓	rFID↓
64	512	0.0	56.45%	31.5487	0.9453	0.0275	16.8227
64	512	0.1	100.00%	29.5755	0.9277	0.0422	14.1500

Entropy regularization can improve codebook usage, but it may hurt the reconstruction quality.

FSQ-VAE

Levels	Codebook size	Codebook usage↑	PSNR↑	SSIM↑	LPIPS↓	rFID↓
[8,8,8]	512	100.00%	30.8543	0.9397	0.0315	15.7079
[8,5,5,5]	1000	100.00%	30.9025	0.9433	0.0266	15.8230

FSQ-VAE does not suffer from the codebook collapse problem.
FSQ-VAE can achieve comparable performance with VQVAE of the same codebook size.

LFQ-VAE

Dim.	Codebook size	Codebook usage↑	PSNR↑	SSIM↑	LPIPS↓	rFID↓
9	512	100.00%	26.1391	0.8685	0.0700	18.5518

⚠️ The result is not as good as expected. Some details may be missing from the implementation.

SimVQ-VAE

Codebook dim.	Codebook size	Codebook usage↑	PSNR↑	SSIM↑	LPIPS↓	rFID↓
64	512	100.00%	31.7468	0.9494	0.0242	14.9863

SimVQ addresses the codebook collapse problem by reparameterizing the codebook through a linear transformation layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark-celeba64.md

benchmark-celeba64.md

CelebA 64x64 Benchmark

Basic setup

VQVAE

FSQ-VAE

LFQ-VAE

SimVQ-VAE

Files

benchmark-celeba64.md

Latest commit

History

benchmark-celeba64.md

File metadata and controls

CelebA 64x64 Benchmark

Basic setup

VQVAE

FSQ-VAE

LFQ-VAE

SimVQ-VAE