-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ParquetError::NeedMoreData
mark ParquetError
as non_exhaustive
#6630
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @etseidl ❤️ -- would it be possible to try and make the error messaegs more specific too (e.g. what ranges were needed, rather than just a single number 🤔 )
@@ -119,7 +119,7 @@ impl<F: MetadataFetch> MetadataLoader<F> { | |||
return Err(ParquetError::EOF(format!( | |||
"file size of {} is less than footer + metadata {}", | |||
file_size, | |||
length + 8 | |||
length + FOOTER_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
@@ -47,6 +47,8 @@ pub enum ParquetError { | |||
IndexOutOfBound(usize, usize), | |||
/// An external error variant | |||
External(Box<dyn Error + Send + Sync>), | |||
/// Returned when a function needs more data to complete properly. | |||
NeedMoreData(usize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also mark ParquetError
non exhaustive at the same time (so future additions won't be breaking API changes)?
Like this:
arrow-rs/object_store/src/lib.rs
Line 1228 in 56525ef
#[non_exhaustive] |
parquet/src/errors.rs
Outdated
@@ -47,6 +47,8 @@ pub enum ParquetError { | |||
IndexOutOfBound(usize, usize), | |||
/// An external error variant | |||
External(Box<dyn Error + Send + Sync>), | |||
/// Returned when a function needs more data to complete properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add some additional comments on how to interpret the argument. Is it the number of bytes that is needed?
Maybe we can return a Range
or something more specific about what is needed if it isn't always clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an earlier POC I did have two separate errors, one with a range, but we thought that a bit redundant at the time. I suppose we could add an Option<Range<usize>>
to NeedMoreData
to handle the case of insufficient buffer to read the page indexes. There we know the exact range we need. Right now the needed
value is set to the number of bytes read from the tail of the file. If try_parse
fails when reading the page indexes, then we really only need the page index range...there's no need to read the footer all over again. But then the error handling becomes more complex...if the range in NeedMoreData
is None
, then read the required bytes from the tail and call try_parse
again; if range is not None
, then read the range and call read_page_indexes
. Ofc a user is free to ignore the range if they want to keep the error handling simple at the cost of more I/O (but that would likely be negligible since those bytes should still be in buffer cache).
For now I'll try to improve the documentation. Let me know if you want me to add the optional Range
to NeedMoreData
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More docs sounds good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @etseidl -- this looks good to me
/// Ok(_) => break, | ||
/// Err(ParquetError::NeedMoreData(needed)) => { | ||
/// // Read the needed number of bytes from the end of the file | ||
/// bytes = get_bytes(&file, len - needed..len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
ParquetError::NeedMoreData
ParquetError::NeedMoreData
mark ParquetError
as non_exhaustive
Which issue does this PR close?
Part of #6447.
Rationale for this change
ParquetMetaDataReader
currently usesParquetError::IndexOutOfBound
to indicate the need for more data. This is not the intended use forIndexOutOfBound
.What changes are included in this PR?
Adds a new enum value
NeedMoreData
.Are there any user-facing changes?
Yes, this changes the
ParquetError
API as well asParquetMetaDataReader
.