Prevent FlightData overflowing max size limit whenever possible. #6690

itsjunetime · 2024-11-05T17:30:12Z

Which issue does this PR close?

What changes are included in this PR?

This reworks the encoding/writing step of flight data messages to ensure it never overflows the given limit whenever possible (specifically, it's impossible when we can't even fit a single row + header within the limit - there are still no mechanisms for splitting a single row of data between multiple messages).

It does this by first constructing a fake IPC header, then getting that header's encoded length, and then subtracting that length from the provided max size. Because the header's size stays the same with the same schema (the only thing that changes is the value within the 'length of data' fields), we don't need to continually recalculate it.

There are more tests I'd like to add before merging this, I was just hoping to get this filed first so that I could get feedback in case any behavior seemed seriously off.

Rationale for these changes

Since we are dynamically checking array data sizes to see if they can fit within the alloted size, this ensures that they will never go over if possible. Of course, as I said before, they will still go over if necessary, but I've rewritten the tests to check this behavior (if the tests sense an overage, but decode it to see that only one row was written, they allow it as there is no other way to get the data across).

Are there any user-facing changes?

Yes, there are API additions. They are documented. As far as I can tell, this shouldn't require a breaking release, but I haven't run anything like cargo-semver-checks on it to actually verify.

…ations

itsjunetime · 2024-11-07T15:18:17Z

arrow-ipc/src/writer.rs

+        };
+
+        match array_data.data_type() {
+            DataType::Dictionary(_, _) => Ok(()),


This is something that I wasn't able to figure out about this encoding process - it seems we don't write the child_data for dictionaries into the encoded message, but that's where all the values of the dictionary are. Without this, we only have the keys written. Does anyone know why this is?

…verage for it

itsjunetime · 2024-11-08T02:04:07Z

It looks like CI failed due to some network flakiness - I'm going to close and reopen to try it again

itsjunetime added 10 commits November 4, 2024 15:02

wip it kinda works but I just need to save state

cefcdaf

Got normal encoding working but dictionaries are still failing

6411e2a

Oh we're fixing things even more

ce7d0da

ok it finally works correctly

fb5c35f

Update previously-commented tests and remove unused parameters

aa959b7

Improve documentation with new, and more correct, comments

86772b3

Add more documentation and format

9673c70

Remove old over-cautious 'min' check

cf5c129

Small comment additions, fn name cleanups, and panic message clarific…

114b31f

…ations

Add more notes for self

1acc902

github-actions bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Nov 5, 2024

itsjunetime added 4 commits November 5, 2024 14:17

Fix for 1.77 msrv

ffad556

Add test (that currently fails) to make sure everything is as expected

e090dae

Make test pass and add comment about why one previous case doesn't pass

910a69e

Fix unused variable

8de2775

itsjunetime commented Nov 7, 2024

View reviewed changes

itsjunetime added 2 commits November 7, 2024 08:58

Update old inaccurate comments about test coverage

c1890d4

Improve size calculations for RunEndEncoded data type and add test co…

842ea66

…verage for it

itsjunetime closed this Nov 8, 2024

itsjunetime reopened this Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent FlightData overflowing max size limit whenever possible. #6690

Prevent FlightData overflowing max size limit whenever possible. #6690

itsjunetime commented Nov 5, 2024

itsjunetime Nov 7, 2024

itsjunetime commented Nov 8, 2024

Prevent FlightData overflowing max size limit whenever possible. #6690

Are you sure you want to change the base?

Prevent FlightData overflowing max size limit whenever possible. #6690

Conversation

itsjunetime commented Nov 5, 2024

Which issue does this PR close?

What changes are included in this PR?

Rationale for these changes

Are there any user-facing changes?

itsjunetime Nov 7, 2024

Choose a reason for hiding this comment

itsjunetime commented Nov 8, 2024