Description
The performance is still not great compared to miniz/zlib on files with long runs of the same byte.
EDIT:
See next post.
Profiling reveals that lz77::longest_match and lz77::get_match_length is where most time is spent.
get_match_length is particularly problematic for data where there is a lot of repetitions of one literal that causes a lot of calls to this function. (As there will be a large amount of entries in the hash chain for the 3-byte sequences of this byte.) Currently it uses two zipped iterators to compare the matches, which may not be ideal performance wise. C implementations of deflate seem to be checking multiple bytes at once by casting the bytes to larger data types. I've tested this, but it didn't seem to make a difference.
In the longest_match function, array lookups seems to be the main cause of the slowdown (maybe because further instructions depend on the array value?). If we can find a way to reduce the number of lookups, or length of the hash chains without impacting compression ratio, that would be helpful to improve performance.
For lower compression levels, other compressors simply hard-limit the length hash chains, and further adaptively reduces the hash chain length when there is a decent match.