-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(ci): Ensure integration workflow passes #643
chore(ci): Ensure integration workflow passes #643
Conversation
We still have a note about flatbuffer alignment for
We also have big-endian failures related to endian swapping.
|
…compressed/dictionary-encoded files (#44298) ### Rationale for this change There are a few remaining failures when testing nanoarrow against itself: apache/arrow-nanoarrow#643 . Our IPC reader doesn't support dictionaries or compression, so we can't run those tests. ### What changes are included in this PR? Skips were added to the archery code that runs the tests. ### Are these changes tested? Yes (integration tests run on every commit) ### Are there any user-facing changes? No! * GitHub Issue: #44297 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
@bkietz Is there any chance you have the bandwidth to make sure I didn't miss anything here? |
src/nanoarrow/ipc/reader.c
Outdated
@@ -230,6 +230,20 @@ static int ArrowIpcArrayStreamReaderNextHeader( | |||
// propagated higher (e.g., if the stream is empty and there's no schema message) | |||
ArrowErrorSet(&private_data->error, "No data available on stream"); | |||
return ENODATA; | |||
} else if (bytes_read == 4) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this special case should specifically assert the metadata version we're currently reading. It's never valid for a V5 message to omit the continuation and a V4 message is invalid if it includes that.
And we can't write:
} else if (bytes_read == 4) { | |
// DON'T commit; broken! | |
} else if (private_data->decoder.metadata_version == NANOARROW_IPC_METADATA_VERSION_V4 && bytes_read == 4) { |
because these peek etc functions are used before we've fully unpacked the schema to know what the metadata version is.
So I think what needs to happen is: the first time a decoder peeks a header and observes the lack of a continuation, we set ArrowIpcDecoderPrivate::prefix_length = 4
. Later when we decode the metadata version, we can raise an error if the prefix length is not as expected. Also we should raise if the prefix length is inconsistent in any future message peeked by the decoder.
As a pleasant side effect, there will be one fewer argument for you to pass around :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a fun turn of events, not all V4 metadata messages have a 4 byte prefix size (i.e., not in the 0.17.1 golden files). I think I included all the other checks you suggested! (And also opened an issue with instructions for how to skip those cases in the future should we decide they don't need to be supported): #648
I'm going to merge this to get the release candidate testing process started...feel free to leave comments and I can update the approach! |
Closes #641.
Unfortunately we just have to skip checking Rust compatibility due to apache/arrow-rs#5052 (e.g., apache/arrow-rs#6449 ).
This PR also ensures compatibility with big endian Arrow files and Arrow files from before the continuation token. Support for those had already been added in the decoder but hadn't made it to the stream reader yet.
Local check: