-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: read and write zfp header to stream #37
base: master
Are you sure you want to change the base?
Conversation
Do we want to remove the test that tests dimension ordering? |
Oh no, I only wanted to remove the second test because it truncates the zfp stream directly and makes it more difficult to design a good test when the underlying stream is shifting. C and F should still decompress correctly and should be tested as a whole. |
Fixed rate mode should support truncating the zfp stream directly though. Truncating and testing on a part of the array guarantees that everybody agrees on dimension ordering. Removing this, not only do we lose a test for dimension ordering, but also for fixed rate mode. |
Is that true? Is that a real use case? I thought that you were supposed to use the array class to do random access to the stream. Truncating the stream manually means you need to know where the headers begin and end in addition to computing the byte offsets of the stream. |
Yes, it is true that fixed-rate mode should support truncating the stream. Why do you disagree?
Less sure of that, but I believe this is the zfp-recommended way to do parallel decompression anyway.
Isn't that the same as when you need to read the stream to decompress the whole thing? Dimension ordering is subtle and it is easy to get results that look ok even though we have messed up the ordering. If you're only ever reading/writing whole arrays, any consistent ordering of dimensions will produce consistent results. When you pick out a part of a multidimensional array, dimension ordering will completely change what you get. That's the reasoning behind this test, at least. If we remove it, what do we replace it with? PS: Take a look at the merge conflict in the PR. |
7704edb
to
c22d74c
Compare
Sorry, I was a little tired and harried when I wrote that. I can see the possibility of streaming a large dataset and wanting to decode it in an online fashion. I've run into a large number of these awful C and F order issues in the past. Without being able to provide a shape argument to decompress, constructing the test requires creating a fake header to trick decompress into creating the right numpy array for the stream. With an online use case in mind, it would make some sense to be able to provide shape and dtype which would then skip looking for a header. That would enable scanning partial segments of the full stream as they arrive. |
I'm adapting the decompress function to enable this function so what's there is a work in progress. I also edited compress and decompress to be much more conservative in how they free variables and close streams using try/finally constructions. This should help avoid leaks in the case of handled exceptions. |
This PR obviates the need to recall compression parameters for decompression by writing the full zfp stream header and reading it on decompression.
It breaks backwards compatibility, but we could make it compatible by omitting the read if parameters are specified.