Let ArrowArrayStreamReader handle schema with attached metadata + do schema checking#8944
Conversation
|
@kylebarron @alamb I added another commit that now is a bit more thorough in checking for metadata intactness in the @kylebarron already approved this PR, could you check again whether you are happy with the small change? |
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- thank you
|
@jonded94 @alamb FYI this was a breaking change for record batches with zero columns. In kylebarron/arro3#475 I bumped package versions only, with no code changes on my side, and I now have two failing tests. This test newly errors with This is because the old code used The new code uses |
|
Thanks for the report I'll flag #9394 as being needed for |
|
@kylebarron thanks for raising this! I haven't looked into that yet in detail, but isn't this then a bug of the |
|
To clarify:
|
# Which issue does this PR close? - Closes #9394 # Rationale for this change PR #8944 introduced a regression that 0-column record batch streams could not longer be decoded. # What changes are included in this PR? - Construct `RecordBatch` with `try_new_with_options` using the `len` of the `ArrayData`, instead of letting it try to implicitly determine `len` by looking at the first column (this is what `try_new` does). - Slight refactor and reduction of code duplication of the existing `test_stream_round_trip_[import/export]` tests - Introduction of a new `test_stream_round_trip_no_columns` test # Are these changes tested? Yes, both export and import are tested in `test_stream_round_trip_no_columns`. # Are there any user-facing changes? 0-column record batch streams should be decodable now.
Which issue does this PR close? / Rationale for this change
Solves an issue discovered during #8790, namely that
ArrowArrayStreamReaderdoes not correctly expose schema-level metadata and does not check whether theStructArrayconstructed from the FFI stream actually in general corresponds to the expected schema.What changes are included in this PR?
RecordBatchis construted insideArrowArrayStreamReadersuch that it holds metadata and schema validity checks are done.Are these changes tested?
Yes, both
_test_round_trip_exportand_test_round_trip_importnow test for metadata on schema- and column-level.Are there any user-facing changes?
Yes,
ArrowArrayStreamReadernow is able to exportRecordBatchwith schema-level metadata, and the interface has increased security since it actually checks for schema validity.