Deep Compression Vector Quantize AutoEncoder?

It's a very impressive job! Well done.

I am wondering if you have conducted any further experiments on vector quantization. The DCAE-f128 can compress a 256x256 image into a 2x2 feature map, resulting in 4 tokens with VQ. This could lead to significant acceleration in LLM training and inference, paving the way for real-time video generation. Feel free to ask if you need any more adjustments!