Skip to content

Conversation

@liamzwbao
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Mentioned in the issue

What changes are included in this PR?

Add validation in shred_variant to allow spec-approved types only.

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Nov 5, 2025
Copy link
Contributor Author

@liamzwbao liamzwbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @klion26, this is the PR to address the confusion in #8768 (comment).

There’s some noise from fmt, please review with “Hide whitespace” on.

cc @alamb @scovich

Comment on lines -315 to -319
DataType::FixedSizeBinary(size) => {
return Err(ArrowError::InvalidArgumentError(format!(
"FixedSizeBinary({size}) is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported."
)));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to shred_variant

@liamzwbao liamzwbao marked this pull request as ready for review November 6, 2025 01:15
Comment on lines -325 to -332
_ if data_type.is_primitive() => {
return Err(ArrowError::NotYetImplemented(format!(
"Primitive data_type {data_type:?} not yet implemented"
)));
}
_ => {
return Err(ArrowError::InvalidArgumentError(format!(
"Not a primitive type: {data_type:?}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, primitive refers to Variant primitive, not Arrow primitive. So we shouldn’t throw an error about “not an Arrow primitive.” Ideally this function should accept any Arrow type, therefore changed this to NotYetImplemented instead.

_ => {
// Supported shredded primitive types, see Variant shredding spec:
// https://github.com/apache/parquet-format/blob/master/VariantShredding.md#shredded-value-types
DataType::Boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't make do the type cast after this? not sure if this is ok.

Currently, we will do the type check in make_primitive_variant_to_arrow_row_builder with the match arms, so maybe we don't need to add it here, and add the type check in two places seems will add maintaince.

VariantToShreddedVariantRowBuilder::Primitive(typed_value_builder)
}
DataType::FixedSizeBinary(_) => {
return Err(ArrowError::InvalidArgumentError(format!("{data_type} is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported.")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to distinguish this with the _ match arm?

StringView(VariantToStringArrowBuilder::new(cast_options, capacity))
}
_ => {
return Err(ArrowError::NotYetImplemented(format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can't remove the _ if data_type.is_primitive() for now, seems this arm is for the type we may but have not implement, and _ match arm is for the types invalid.

After #8768 merged, we complete the 1-1 mapping(and some transforms for some types) for all Variant primitive types, but we may support some DataTypes here which is not a valid variant primitive(e.g. Timestamp with different unit), and keep the _ if data_type.is_primitive() so that we know we may support the required, and remove it after we have a conclusion.

I am sorting out possible conversions and will create an issue to discuss them after the work done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Enforce shredded-type validation in shred_variant

2 participants