-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add variant type support to ParquetTypeVisitor #14588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement variant(GroupType) method in ParquetTypeVisitor and all subclasses to enable proper handling of Parquet variant logical types during schema conversion and manipulation operations.
|
cc @aihuaxu |
aihuaxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. Otherwise looks good.
| return visitList(group, visitor); | ||
| } else if (LogicalTypeAnnotation.mapType().equals(annotation)) { | ||
| return visitMap(group, visitor); | ||
| } else if (LogicalTypeAnnotation.variantType((byte) 1).equals(annotation)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use Variant.VARIANT_SPEC_VERSION instead of hardcoded 1.
| primitive(29, "v", PrimitiveTypeName.INT32, Repetition.REQUIRED)), | ||
| variant(30, "variant_col_1", Repetition.OPTIONAL), | ||
| variant(null, "variant_col_2", Repetition.REQUIRED), | ||
| list(31, "list_col_6", Repetition.OPTIONAL, variant(32, "v", Repetition.OPTIONAL)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a test with a variant in map?
| private Type variant(Integer id, String name, Repetition repetition) { | ||
| GroupBuilder<GroupType> builder = | ||
| org.apache.parquet.schema.Types.buildGroup(repetition) | ||
| .as(org.apache.parquet.schema.LogicalTypeAnnotation.variantType((byte) 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Use Variant.VARIANT_SPEC_VERSION instead of 1.
Summary
Adds variant type support to
ParquetTypeVisitorand all its subclasses to enable proper handling of Parquet variant logical types during schema operations.Background
This issue surfaced when using
ParquetUtil.footerMetrics(), which callsconvertAndPrune()on the Parquet schema.TestVariantMetricsuseswriteParquet()and callsParquetMetrics.metrics()directly, which bypasses the schema conversion path and didn't expose this gap. Without thevariant()method implementations, variant fields were being skipped during schema conversion, which then caused an NPE inTypeWithSchemaVisitorwhen it tried to process the variant field that was missing from the converted schema.Changes
variant(GroupType)method toParquetTypeVisitorbase classvariant()in allParquetTypeVisitorsubclasses:MessageTypeToType- converts Parquet variant to IcebergVariantTypeApplyNameMapping- applies name mappings to variant fieldsParquetSchemaUtil.HasIds- checks for field IDs in variant typesRemoveIds- removes IDs from variant schemastestVariantTypeConversion()inTestParquetSchemaUtilTesting
New test validates schema conversion from Parquet variant logical type to Iceberg
VariantType.