Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSE_compress returns 0 even when maxDstSize is still equal to or slightly larger than final compression size #90

Open
KrzysFR opened this issue Mar 30, 2018 · 3 comments

Comments

@KrzysFR
Copy link
Contributor

KrzysFR commented Mar 30, 2018

While adding more unit tests, I found a corner case when trying to compress a document with a maxDstSize which is exactly the expected compressed size (found by a previous compression attempt): in this test, FSE_compress(..) returns 0 (uncompressible data) while previously it was able to compress the same input (given a larger destination buffer).

I'm doing a two-step process:

  1. Allocate a buffer large enough (using FSE_compressBound), and compress the source by calling FSE_compress(..., maxDstSize: FSE_compressBound(ORIGINAL_SIZE)), and measure its final compressed size COMPRESSED_SIZE.
  2. Repeat the process on the same buffer and payload, but this time calling FSE_compress(..., maxDstSize: COMPRESSED_SIZE). I get a result code of 0 which means that data is uncompressible.

I tried probing the minimum size that will allow compressing the buffer (which is known to be compressible), and each time I need to call FSE_compress(..) with at least COMPRESSED_SIZE + 8. At first I thought it could be a pointer alignment issue, but it is always + 8 bytes whatever compressed size is (by changing the source a bit).

In my test, raw original size is 4,288 bytes, FSE_compressBound return 4,833 bytes, and the compressed size is 2,821 bytes. I need to pass at least 2,823+8 = 2,829 bytes for FSE_compress to succeed (return value > 0).

Is this expected behavior? I'm not sure if the "+8 rule" is true, or if this is random chance with the inputs I'm passing in.

@KrzysFR KrzysFR changed the title FSE_compress returns 0 even when maxDstSize is still equal to or slight larger than final compression size FSE_compress returns 0 even when maxDstSize is still equal to or slightly larger than final compression size Mar 30, 2018
@Cyan4973
Copy link
Owner

This is an expected behavior, even if badly documented.

FSE_compress() writes data in "stripes" of size size_t, and will effectively need to write more than the amount of compressed data.

The amount of additional buffer space is not specified, but "+8" rule is about right.
It's not documented because this kind of detail is implementation specific, and may change in the future.

@Bulat-Ziganshin
Copy link
Contributor

Bulat-Ziganshin commented Mar 31, 2018

Yeah, it's our internal machinery which is absolutely unintuitive for library users. I think that this issue may be considered as call for doc improvement. You can say that to ensure compression user should alloc exactly amount of memory computed by maxbound, and that compressor can write beyond size of final compressed data for the sake of speed, "up to 8 bytes in the x.x version, but it may be changed in future"

PS: forgot to say that it may be applicable to [your] other compression libraries as well

@KrzysFR
Copy link
Contributor Author

KrzysFR commented Mar 31, 2018

I guess that this is not that big of a deal anyway: the chance to guess the compressed size within 8 bytes is low, and if the caller decides to under-allocate (compared to what FSE_compressBound returns) it comes with a high risk in the first place.

You'd only do this either because you are under tight memory constraints, or are dealing with data with very predictable compression ratio (but again, less than 8 bytes of headroom is tight!).

I'll add a warning in my code about that, and update the test to round up the allocated buffer.

Should I close this issue, or do you want to keep it around, if you ever decide to update the doc at a later time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants