Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison to arithmetic coding? #62

Open
MarcusJohnson91 opened this issue Jan 6, 2016 · 8 comments
Open

Comparison to arithmetic coding? #62

MarcusJohnson91 opened this issue Jan 6, 2016 · 8 comments

Comments

@MarcusJohnson91
Copy link

MarcusJohnson91 commented Jan 6, 2016

I know the wiki says that the performance is similar, but can we get a benchmark comparing processing time and compression ratio to know exactly how well it performs compared to it's closest competition?

@Cyan4973
Copy link
Owner

Cyan4973 commented Jan 6, 2016

The problem is to find a relevant competitor.
There are many different arithmetic coders out there, but selecting one of them as "representative" could raise suspicion that it was an easy target to beat. So it would need to be an reknowned version.

In contrast with Huffman, which has an excellent and widely acknowledged implementation within zlib, I don't know yet of an arithmetic implementation with identical status. Maybe within lzma, although my understanding is that lzma is limited to binary arithmetic coder, a variant which do not compete with FSE since it's limited to 2 symbols (yes/no) per operation.

Another possibility could be to provide the Shannon limit, which is the theoretical maximum compression ratio, thus showing there is almost nothing to left to gain. But it wouldn't help to compare speed.

@MarcusJohnson91
Copy link
Author

A comparison to the Shannon entropy limit would be better than nothing, any idea what the rough numbers are off hand?

@Cyan4973
Copy link
Owner

Cyan4973 commented Jan 6, 2016

I haven't calculated them precisely, but I suspect FSE to be fairly close to the limit, likely < 0.1%, although without counting headers (which account for the majority of the difference). Note though that the header issue would be the same for any arithmetic coder.

@MarcusJohnson91
Copy link
Author

Wow, I was expecting at least 5%, and Huffman is 12.5% minimum (not counting headers), that's crazy good.

@JarekDuda
Copy link

bumblebritches57, the closeness to Shannon entropy depends on the probability distribution - e.g. Huffman is perfect for probabilities being a power of 1/2.
Like arithmetic coding, ANS can get as close Shannon as we want for any probability distribution - the (KL) distance depends on parameters. tANS variant used in FSE is defined by a symbol spread: filling a table of length L (2048 in FSE) with appearances of symbols - with proportions corresponding to the probability distribution. You can get Huffman this way by spreading symbols in ranges, which have length being a power of 2, so tANS can be seen as extension of Huffman.
The larger L, the closer to Shannon - the distance generally decrease like 1/L^2: double L in FSE to get 4 times closer, but the tables might no longer fit L1 cache (speed). For LZMA-like compression there is used a bit more accurate rANS variant - which requires multiplication (in contrast to tANS), but allows for a bit better memory tradeoffs and is better for dynamical modifications in adaptive compression.
If you want to test distance from Shannon of tANS for various parameters and symbol spreads: https://github.com/JarekDuda/AsymmetricNumeralSystemsToolkit
Some benchamarks of entropy coders:
http://encode.ru/threads/1920-In-memory-open-source-benchmark-of-entropy-coders?p=45168&viewfull=1#post45168
https://sites.google.com/site/powturbo/entropy-coder

@MarcusJohnson91
Copy link
Author

Thanks I'll check it out

@Bulat-Ziganshin
Copy link
Contributor

http://sachingarg.com/compression/entropy_coding/64bit/sg_entropy_coders.zip looks like pretty standard codecs, at least for range coder part. more info at http://sachingarg.com/compression/entropy_coding/64bit/

@jkbonfield
Copy link

jkbonfield commented May 1, 2016

You could also compare against static block-based arithmetic compression. I optimised one of those (unrolling and interleaving Eugene Shelwien's) just before Ryg posted his rANS, so then gave that the same treatment to that as a side by side comparison. Both are beaten by FSE, but due to the interleaving and static nature I think this is one of the fastest arithmetic (range) coders out there and as such targetting the same job as FSE.

See arith_static.c in https://github.com/jkbonfield/rans_static or look at my benchmarks, although the data may not be so obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants