Comparison to arithmetic coding? #62

MarcusJohnson91 · 2016-01-06T12:26:33Z

I know the wiki says that the performance is similar, but can we get a benchmark comparing processing time and compression ratio to know exactly how well it performs compared to it's closest competition?

Cyan4973 · 2016-01-06T12:33:54Z

The problem is to find a relevant competitor.
There are many different arithmetic coders out there, but selecting one of them as "representative" could raise suspicion that it was an easy target to beat. So it would need to be an reknowned version.

In contrast with Huffman, which has an excellent and widely acknowledged implementation within zlib, I don't know yet of an arithmetic implementation with identical status. Maybe within lzma, although my understanding is that lzma is limited to binary arithmetic coder, a variant which do not compete with FSE since it's limited to 2 symbols (yes/no) per operation.

Another possibility could be to provide the Shannon limit, which is the theoretical maximum compression ratio, thus showing there is almost nothing to left to gain. But it wouldn't help to compare speed.

MarcusJohnson91 · 2016-01-06T12:46:48Z

A comparison to the Shannon entropy limit would be better than nothing, any idea what the rough numbers are off hand?

Cyan4973 · 2016-01-06T13:29:04Z

I haven't calculated them precisely, but I suspect FSE to be fairly close to the limit, likely < 0.1%, although without counting headers (which account for the majority of the difference). Note though that the header issue would be the same for any arithmetic coder.

MarcusJohnson91 · 2016-01-06T13:32:32Z

Wow, I was expecting at least 5%, and Huffman is 12.5% minimum (not counting headers), that's crazy good.

JarekDuda · 2016-01-07T06:42:35Z

bumblebritches57, the closeness to Shannon entropy depends on the probability distribution - e.g. Huffman is perfect for probabilities being a power of 1/2.
Like arithmetic coding, ANS can get as close Shannon as we want for any probability distribution - the (KL) distance depends on parameters. tANS variant used in FSE is defined by a symbol spread: filling a table of length L (2048 in FSE) with appearances of symbols - with proportions corresponding to the probability distribution. You can get Huffman this way by spreading symbols in ranges, which have length being a power of 2, so tANS can be seen as extension of Huffman.
The larger L, the closer to Shannon - the distance generally decrease like 1/L^2: double L in FSE to get 4 times closer, but the tables might no longer fit L1 cache (speed). For LZMA-like compression there is used a bit more accurate rANS variant - which requires multiplication (in contrast to tANS), but allows for a bit better memory tradeoffs and is better for dynamical modifications in adaptive compression.
If you want to test distance from Shannon of tANS for various parameters and symbol spreads: https://github.com/JarekDuda/AsymmetricNumeralSystemsToolkit
Some benchamarks of entropy coders:
http://encode.ru/threads/1920-In-memory-open-source-benchmark-of-entropy-coders?p=45168&viewfull=1#post45168
https://sites.google.com/site/powturbo/entropy-coder

MarcusJohnson91 · 2016-01-07T07:44:34Z

Thanks I'll check it out

Bulat-Ziganshin · 2016-04-30T22:07:20Z

http://sachingarg.com/compression/entropy_coding/64bit/sg_entropy_coders.zip looks like pretty standard codecs, at least for range coder part. more info at http://sachingarg.com/compression/entropy_coding/64bit/

jkbonfield · 2016-05-01T03:07:09Z

You could also compare against static block-based arithmetic compression. I optimised one of those (unrolling and interleaving Eugene Shelwien's) just before Ryg posted his rANS, so then gave that the same treatment to that as a side by side comparison. Both are beaten by FSE, but due to the interleaving and static nature I think this is one of the fastest arithmetic (range) coders out there and as such targetting the same job as FSE.

See arith_static.c in https://github.com/jkbonfield/rans_static or look at my benchmarks, although the data may not be so obvious.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison to arithmetic coding? #62

Comparison to arithmetic coding? #62

MarcusJohnson91 commented Jan 6, 2016 •

edited

Loading

Cyan4973 commented Jan 6, 2016

MarcusJohnson91 commented Jan 6, 2016

Cyan4973 commented Jan 6, 2016

MarcusJohnson91 commented Jan 6, 2016

JarekDuda commented Jan 7, 2016

MarcusJohnson91 commented Jan 7, 2016

Bulat-Ziganshin commented Apr 30, 2016

jkbonfield commented May 1, 2016 •

edited

Loading

Comparison to arithmetic coding? #62

Comparison to arithmetic coding? #62

Comments

MarcusJohnson91 commented Jan 6, 2016 • edited Loading

Cyan4973 commented Jan 6, 2016

MarcusJohnson91 commented Jan 6, 2016

Cyan4973 commented Jan 6, 2016

MarcusJohnson91 commented Jan 6, 2016

JarekDuda commented Jan 7, 2016

MarcusJohnson91 commented Jan 7, 2016

Bulat-Ziganshin commented Apr 30, 2016

jkbonfield commented May 1, 2016 • edited Loading

MarcusJohnson91 commented Jan 6, 2016 •

edited

Loading

jkbonfield commented May 1, 2016 •

edited

Loading