-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant] Enforce shredded-type validation in shred_variant
#8796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Variant] Enforce shredded-type validation in shred_variant
#8796
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @klion26, this is the PR to address the confusion in #8768 (comment).
There’s some noise from fmt, please review with “Hide whitespace” on.
| DataType::FixedSizeBinary(size) => { | ||
| return Err(ArrowError::InvalidArgumentError(format!( | ||
| "FixedSizeBinary({size}) is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported." | ||
| ))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to shred_variant
| _ if data_type.is_primitive() => { | ||
| return Err(ArrowError::NotYetImplemented(format!( | ||
| "Primitive data_type {data_type:?} not yet implemented" | ||
| ))); | ||
| } | ||
| _ => { | ||
| return Err(ArrowError::InvalidArgumentError(format!( | ||
| "Not a primitive type: {data_type:?}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, primitive refers to Variant primitive, not Arrow primitive. So we shouldn’t throw an error about “not an Arrow primitive.” Ideally this function should accept any Arrow type, therefore changed this to NotYetImplemented instead.
| _ => { | ||
| // Supported shredded primitive types, see Variant shredding spec: | ||
| // https://github.com/apache/parquet-format/blob/master/VariantShredding.md#shredded-value-types | ||
| DataType::Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't make do the type cast after this? not sure if this is ok.
Currently, we will do the type check in make_primitive_variant_to_arrow_row_builder with the match arms, so maybe we don't need to add it here, and add the type check in two places seems will add maintaince.
| VariantToShreddedVariantRowBuilder::Primitive(typed_value_builder) | ||
| } | ||
| DataType::FixedSizeBinary(_) => { | ||
| return Err(ArrowError::InvalidArgumentError(format!("{data_type} is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported."))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to distinguish this with the _ match arm?
| StringView(VariantToStringArrowBuilder::new(cast_options, capacity)) | ||
| } | ||
| _ => { | ||
| return Err(ArrowError::NotYetImplemented(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can't remove the _ if data_type.is_primitive() for now, seems this arm is for the type we may but have not implement, and _ match arm is for the types invalid.
After #8768 merged, we complete the 1-1 mapping(and some transforms for some types) for all Variant primitive types, but we may support some DataTypes here which is not a valid variant primitive(e.g. Timestamp with different unit), and keep the _ if data_type.is_primitive() so that we know we may support the required, and remove it after we have a conclusion.
I am sorting out possible conversions and will create an issue to discuss them after the work done.
Which issue does this PR close?
shred_variant#8795.Rationale for this change
Mentioned in the issue
What changes are included in this PR?
Add validation in
shred_variantto allow spec-approved types only.Are these changes tested?
Yes
Are there any user-facing changes?