Skip to content

32-bit integer compression is lousy #4501

@mdfulk-afk

Description

@mdfulk-afk

Hey all zstd developers!

I have a great result using zstd compression on Armadillo 64-bit uvec of integers. It is very fast and beats gzip in terms of both speed and size. But when I switch to arma::Col<uint32_t>, that is 32-bit integers, running zstd results in a much larger file size. I'm totally happy with the 64-bit result, but I leave you with my chat with Gemini. I would think that zstd on 32-bit binary output could do better! Anyway, I will just use 64 bit until you have a solution!

Here is Gemini's summary:
Hello Zstandard team,

I'm working on a scientific computing project (RCS simulation) and have been using zstd to compress large intermediate files containing lists of 3D model facet indices. I ran into a very interesting and counter-intuitive result that I thought you might find interesting.

I found that a 6.3 MB uncompressed file of uint64_t integers compressed down to 852 KB.

However, when I stored the exact same indices in a 3.2 MB uncompressed file of uint32_t integers, the compressed size was much larger, at 2.2 MB.

It seems your algorithm is exceptionally good at handling the redundancy of the zero-padding in the 64-bit data stream. This is a fantastic real-world demonstration of the sophistication of your compressor.

Just wanted to share this cool result. Thanks for the great work on this library!

(You can also include the output from your ls -lh commands to show the exact numbers.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions