feat: read and write zfp header to stream #37

william-silversmith · 2022-07-13T21:04:14Z

This PR obviates the need to recall compression parameters for decompression by writing the full zfp stream header and reading it on decompression.

It breaks backwards compatibility, but we could make it compatible by omitting the read if parameters are specified.

navjotk · 2022-07-15T12:06:02Z

Do we want to remove the test that tests dimension ordering?

william-silversmith · 2022-07-15T19:10:33Z

Oh no, I only wanted to remove the second test because it truncates the zfp stream directly and makes it more difficult to design a good test when the underlying stream is shifting. C and F should still decompress correctly and should be tested as a whole.

navjotk · 2022-07-19T15:18:08Z

Fixed rate mode should support truncating the zfp stream directly though. Truncating and testing on a part of the array guarantees that everybody agrees on dimension ordering. Removing this, not only do we lose a test for dimension ordering, but also for fixed rate mode.

william-silversmith · 2022-07-19T21:21:52Z

Is that true? Is that a real use case? I thought that you were supposed to use the array class to do random access to the stream. Truncating the stream manually means you need to know where the headers begin and end in addition to computing the byte offsets of the stream.

navjotk · 2022-07-26T14:53:16Z

Is that true?

Yes, it is true that fixed-rate mode should support truncating the stream. Why do you disagree?

Is that a real use case?

Less sure of that, but I believe this is the zfp-recommended way to do parallel decompression anyway.

you need to know where the headers begin and end in addition to computing the byte offsets of the stream.

Isn't that the same as when you need to read the stream to decompress the whole thing?

Dimension ordering is subtle and it is easy to get results that look ok even though we have messed up the ordering. If you're only ever reading/writing whole arrays, any consistent ordering of dimensions will produce consistent results. When you pick out a part of a multidimensional array, dimension ordering will completely change what you get. That's the reasoning behind this test, at least. If we remove it, what do we replace it with?

PS: Take a look at the merge conflict in the PR.

william-silversmith · 2022-07-27T06:30:53Z

Sorry, I was a little tired and harried when I wrote that. I can see the possibility of streaming a large dataset and wanting to decode it in an online fashion. I've run into a large number of these awful C and F order issues in the past.

Without being able to provide a shape argument to decompress, constructing the test requires creating a fake header to trick decompress into creating the right numpy array for the stream. With an online use case in mind, it would make some sense to be able to provide shape and dtype which would then skip looking for a header. That would enable scanning partial segments of the full stream as they arrive.

william-silversmith · 2022-07-27T06:32:28Z

I'm adapting the decompress function to enable this function so what's there is a work in progress. I also edited compress and decompress to be much more conservative in how they free variables and close streams using try/finally constructions. This should help avoid leaks in the case of handled exceptions.

PR has changed since review

william-silversmith mentioned this pull request Jul 15, 2022

Add docstrings for compress/decompress. #34

Merged

navjotk previously approved these changes Jul 26, 2022

View reviewed changes

william-silversmith added 2 commits July 27, 2022 01:38

feat: read and write stream headers

b0c8008

fixtest: without decompress shape, can't run dim test

c22d74c

william-silversmith force-pushed the wms_write_header branch from 7704edb to c22d74c Compare July 27, 2022 05:46

william-silversmith added 2 commits July 27, 2022 02:14

fix: be careful about ensuring fields and streams are freed and closed

6ff73bb

feat: add header reading function

667c91d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: read and write zfp header to stream #37

feat: read and write zfp header to stream #37

william-silversmith commented Jul 13, 2022

navjotk commented Jul 15, 2022

william-silversmith commented Jul 15, 2022

navjotk commented Jul 19, 2022

william-silversmith commented Jul 19, 2022

navjotk commented Jul 26, 2022 •

edited

Loading

william-silversmith commented Jul 27, 2022

william-silversmith commented Jul 27, 2022

feat: read and write zfp header to stream #37

Are you sure you want to change the base?

feat: read and write zfp header to stream #37

Conversation

william-silversmith commented Jul 13, 2022

navjotk commented Jul 15, 2022

william-silversmith commented Jul 15, 2022

navjotk commented Jul 19, 2022

william-silversmith commented Jul 19, 2022

navjotk commented Jul 26, 2022 • edited Loading

william-silversmith commented Jul 27, 2022

william-silversmith commented Jul 27, 2022

navjotk commented Jul 26, 2022 •

edited

Loading