-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
I need a place to put down some ideas for further optimizing this crate:
Decoder
- We only need to call reserve once for each block of sequences. We can calculate how many bytes will be added to the decode buffer by a list of sequences. This might save some re-allocations.
- The way the zstd_streaming binary works is not optimal. It should just use the drain_to_writer() functions instead of reading into an intermediary buffer. That's why we have these functions.
- Read https://fgiesen.wordpress.com/2018/02/19/reading-bits-in-far-too-many-ways-part-1/ and https://fgiesen.wordpress.com/2018/02/20/reading-bits-in-far-too-many-ways-part-2/ again carefully and optimize the bitreaders further
- The ReversedBitreader performance can be enhanced quite a bit by being less useful in the generic case. Just returning wrong values for requests of >56 bits eliminates the need for error handling on calls to the get_bits_(triple) started in don't return errors on too large requests on a reversed bitreader #58
- The
RingBuffer::extend_from_within
does a lot of small memcpy calls. These can be sped up a lot by not caring about precise copying of values behind the range we want to copy. Copying a/multiple u128 (where possible) speeds this up by a lot.
Encoder
The main thing taking time in the encoder is the match finding algorithm
- Different matcher algorithms
- Faster hashing for the hashtable based matcher algorithm
paolobarbolini, L3P3, Vezzp and V1ammer
Metadata
Metadata
Assignees
Labels
No labels