-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent FlightData overflowing max size limit whenever possible. #6690
base: master
Are you sure you want to change the base?
Prevent FlightData overflowing max size limit whenever possible. #6690
Conversation
}; | ||
|
||
match array_data.data_type() { | ||
DataType::Dictionary(_, _) => Ok(()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that I wasn't able to figure out about this encoding process - it seems we don't write the child_data
for dictionaries into the encoded message, but that's where all the values of the dictionary are. Without this, we only have the keys written. Does anyone know why this is?
It looks like CI failed due to some network flakiness - I'm going to close and reopen to try it again |
Which issue does this PR close?
Closes #3478
What changes are included in this PR?
This reworks the encoding/writing step of flight data messages to ensure it never overflows the given limit whenever possible (specifically, it's impossible when we can't even fit a single row + header within the limit - there are still no mechanisms for splitting a single row of data between multiple messages).
It does this by first constructing a fake IPC header, then getting that header's encoded length, and then subtracting that length from the provided max size. Because the header's size stays the same with the same schema (the only thing that changes is the value within the 'length of data' fields), we don't need to continually recalculate it.
There are more tests I'd like to add before merging this, I was just hoping to get this filed first so that I could get feedback in case any behavior seemed seriously off.
Rationale for these changes
Since we are dynamically checking array data sizes to see if they can fit within the alloted size, this ensures that they will never go over if possible. Of course, as I said before, they will still go over if necessary, but I've rewritten the tests to check this behavior (if the tests sense an overage, but decode it to see that only one row was written, they allow it as there is no other way to get the data across).
Are there any user-facing changes?
Yes, there are API additions. They are documented. As far as I can tell, this shouldn't require a breaking release, but I haven't run anything like cargo-semver-checks on it to actually verify.