Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: read and write zfp header to stream #37

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

william-silversmith
Copy link
Contributor

This PR obviates the need to recall compression parameters for decompression by writing the full zfp stream header and reading it on decompression.

It breaks backwards compatibility, but we could make it compatible by omitting the read if parameters are specified.

@navjotk
Copy link
Owner

navjotk commented Jul 15, 2022

Do we want to remove the test that tests dimension ordering?

@william-silversmith
Copy link
Contributor Author

Oh no, I only wanted to remove the second test because it truncates the zfp stream directly and makes it more difficult to design a good test when the underlying stream is shifting. C and F should still decompress correctly and should be tested as a whole.

@navjotk
Copy link
Owner

navjotk commented Jul 19, 2022

Fixed rate mode should support truncating the zfp stream directly though. Truncating and testing on a part of the array guarantees that everybody agrees on dimension ordering. Removing this, not only do we lose a test for dimension ordering, but also for fixed rate mode.

@william-silversmith
Copy link
Contributor Author

Is that true? Is that a real use case? I thought that you were supposed to use the array class to do random access to the stream. Truncating the stream manually means you need to know where the headers begin and end in addition to computing the byte offsets of the stream.

navjotk
navjotk previously approved these changes Jul 26, 2022
@navjotk
Copy link
Owner

navjotk commented Jul 26, 2022

Is that true?

Yes, it is true that fixed-rate mode should support truncating the stream. Why do you disagree?

Is that a real use case?

Less sure of that, but I believe this is the zfp-recommended way to do parallel decompression anyway.

you need to know where the headers begin and end in addition to computing the byte offsets of the stream.

Isn't that the same as when you need to read the stream to decompress the whole thing?

Dimension ordering is subtle and it is easy to get results that look ok even though we have messed up the ordering. If you're only ever reading/writing whole arrays, any consistent ordering of dimensions will produce consistent results. When you pick out a part of a multidimensional array, dimension ordering will completely change what you get. That's the reasoning behind this test, at least. If we remove it, what do we replace it with?

PS: Take a look at the merge conflict in the PR.

@william-silversmith
Copy link
Contributor Author

Sorry, I was a little tired and harried when I wrote that. I can see the possibility of streaming a large dataset and wanting to decode it in an online fashion. I've run into a large number of these awful C and F order issues in the past.

Without being able to provide a shape argument to decompress, constructing the test requires creating a fake header to trick decompress into creating the right numpy array for the stream. With an online use case in mind, it would make some sense to be able to provide shape and dtype which would then skip looking for a header. That would enable scanning partial segments of the full stream as they arrive.

@william-silversmith
Copy link
Contributor Author

I'm adapting the decompress function to enable this function so what's there is a work in progress. I also edited compress and decompress to be much more conservative in how they free variables and close streams using try/finally constructions. This should help avoid leaks in the case of handled exceptions.

@navjotk navjotk dismissed their stale review August 3, 2022 15:26

PR has changed since review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants