Skip to content

Latest commit

 

History

History
43 lines (24 loc) · 2.12 KB

benchmark-imagenet256.md

File metadata and controls

43 lines (24 loc) · 2.12 KB

ImageNet 256x256 Benchmark

This benchmark aims to reproduce the results reported in the papers as closely as possible.

VQGAN (Taming Transformers)

paper / config

Downsample ratio Codebook dim. Codebook size Codebook usage↑ PSNR↑ SSIM↑ LPIPS↓ rFID↓
16 256 1024 37.50% 19.9142 0.5052 0.1778 5.8165
  • ️🌱 The PSNR and SSIM are close to the results reported in the paper (19.4 & 0.50).
  • ️🌱 The rFID is even better than the results reported in the paper (7.94).
  • 🎈 The model suffers from the low codebook usage problem.

VQGAN (LlamaGen)

paper / config

Downsample ratio Codebook dim. Codebook size Codebook usage↑ PSNR↑ SSIM↑ LPIPS↓ rFID↓
16 8 16384 100% 20.7201 0.5509 0.1385 2.1073
  • ️🌱 The PSNR is close to the results reported in the paper (20.79).
  • ️🌱 The rFID is even slightly better than the results reported in the paper (2.19).

TiTok

paper / project page / config

# tokens Codebook dim. Codebook size Codebook usage↑ PSNR↑ SSIM↑ LPIPS↓ rFID↓
64 12 4096 100% 17.8995 0.4022 0.2681 4.6691
  • ⚠️ The model is trained with a single-stage training strategy, which is different from the paper.
  • ⚠️ The results are not good. Reconstructed images contain repeated patterns and artifacts. Need further investigation.