This benchmark does not aim to achieve the best performance, but to gain insights into the behavior of different quantization methods.
Therefore, we use the same basic setup for all the experiments:
- The network architecture is SimpleCNN.
- Hyperparameters:
- Batch size: 256
- Learning rate: 4e-4
- Optimizer: Adam
- Training steps: 500k
- Results are evaluated on CelebA test split which contains 19962 images.
Effect of codebook dimension:
Codebook dim. |
Codebook size |
Codebook usage↑ |
PSNR↑ |
SSIM↑ |
LPIPS↓ |
rFID↓ |
4 |
512 |
100.00% |
32.2119 |
0.9517 |
0.0239 |
16.3249 |
8 |
100.00% |
32.2406 |
0.9520 |
0.0228 |
16.6592 |
16 |
68.75% |
31.6909 |
0.9473 |
0.0263 |
16.4272 |
32 |
66.41% |
31.7674 |
0.9480 |
0.0261 |
16.3970 |
64 |
56.45% |
31.5487 |
0.9453 |
0.0275 |
16.8227 |
- Smaller codebook dimension leads to higher codebook usage.
Effect of codebook size:
Codebook dim. |
Codebook size |
Codebook usage↑ |
PSNR↑ |
SSIM↑ |
LPIPS↓ |
rFID↓ |
64 |
512 |
56.45% |
31.5487 |
0.9453 |
0.0275 |
16.8227 |
1024 |
30.18% |
31.3836 |
0.9459 |
0.0272 |
16.4965 |
2048 |
16.06% |
31.6631 |
0.9470 |
0.0264 |
16.5808 |
- With low codebook usage, increasing codebook size cannot improve the reconstruction quality.
Effect of l2-norm codes:
Codebook dim. |
Codebook size |
l2-norm codes |
Codebook usage↑ |
PSNR↑ |
SSIM↑ |
LPIPS↓ |
rFID↓ |
4 |
512 |
No |
100.00% |
32.2119 |
0.9517 |
0.0239 |
16.3249 |
Yes |
100.00% |
32.2439 |
0.9473 |
|
16.4495 |
64 |
512 |
No |
56.45% |
31.5487 |
0.9453 |
0.0275 |
16.8227 |
Yes |
98.24% |
31.3334 |
0.9492 |
0.0209 |
12.9127 |
- The l2-normalized codes can improve codebook usage even when codebook dimension is large.
Effect of EMA update:
Codebook dim. |
Codebook size |
Codebook update |
Codebook usage↑ |
PSNR↑ |
SSIM↑ |
LPIPS↓ |
rFID↓ |
4 |
512 |
VQ loss |
100.00% |
32.2119 |
0.9517 |
0.0239 |
16.3249 |
EMA |
100.00% |
32.3070 |
0.9528 |
0.0224 |
16.3338 |
64 |
512 |
VQ loss |
56.45% |
31.5487 |
0.9453 |
0.0275 |
16.8227 |
EMA |
100.00% |
32.0709 |
0.9516 |
0.0228 |
15.5629 |
- Using EMA to update the codebook can improve codebook usage even when codebook dimension is large.
Effect of entropy regularization:
Codebook dim. |
Codebook size |
Entropy reg. weight |
Codebook usage↑ |
PSNR↑ |
SSIM↑ |
LPIPS↓ |
rFID↓ |
64 |
512 |
0.0 |
56.45% |
31.5487 |
0.9453 |
0.0275 |
16.8227 |
0.1 |
100.00% |
29.5755 |
0.9277 |
0.0422 |
14.1500 |
- Entropy regularization can improve codebook usage, but it may hurt the reconstruction quality.
Levels |
Codebook size |
Codebook usage↑ |
PSNR↑ |
SSIM↑ |
LPIPS↓ |
rFID↓ |
[8,8,8] |
512 |
100.00% |
30.8543 |
0.9397 |
0.0315 |
15.7079 |
[8,5,5,5] |
1000 |
100.00% |
30.9025 |
0.9433 |
0.0266 |
15.8230 |
- FSQ-VAE does not suffer from the codebook collapse problem.
- FSQ-VAE can achieve comparable performance with VQVAE of the same codebook size.
Dim. |
Codebook size |
Codebook usage↑ |
PSNR↑ |
SSIM↑ |
LPIPS↓ |
rFID↓ |
9 |
512 |
100.00% |
26.1391 |
0.8685 |
0.0700 |
18.5518 |
⚠️ The result is not as good as expected. Some details may be missing from the implementation.
Codebook dim. |
Codebook size |
Codebook usage↑ |
PSNR↑ |
SSIM↑ |
LPIPS↓ |
rFID↓ |
64 |
512 |
100.00% |
31.7468 |
0.9494 |
0.0242 |
14.9863 |
- SimVQ addresses the codebook collapse problem by reparameterizing the codebook through a linear transformation layer.