Skip to content

Conversation

@tmater
Copy link

@tmater tmater commented Nov 14, 2025

Summary

Adds variant type support to ParquetTypeVisitor and all its subclasses to enable proper handling of Parquet variant logical types during schema operations.

Background

This issue surfaced when using ParquetUtil.footerMetrics(), which calls convertAndPrune() on the Parquet schema. TestVariantMetrics uses writeParquet() and calls ParquetMetrics.metrics() directly, which bypasses the schema conversion path and didn't expose this gap. Without the variant() method implementations, variant fields were being skipped during schema conversion, which then caused an NPE in TypeWithSchemaVisitor when it tried to process the variant field that was missing from the converted schema.

Changes

  • Add variant(GroupType) method to ParquetTypeVisitor base class
  • Implement variant() in all ParquetTypeVisitor subclasses:
    • MessageTypeToType - converts Parquet variant to Iceberg VariantType
    • ApplyNameMapping - applies name mappings to variant fields
    • ParquetSchemaUtil.HasIds - checks for field IDs in variant types
    • RemoveIds - removes IDs from variant schemas
  • Add test testVariantTypeConversion() in TestParquetSchemaUtil

Testing

New test validates schema conversion from Parquet variant logical type to Iceberg VariantType.

Implement variant(GroupType) method in ParquetTypeVisitor and all
subclasses to enable proper handling of Parquet variant logical types
during schema conversion and manipulation operations.
@huaxingao
Copy link
Contributor

cc @aihuaxu

Copy link
Contributor

@aihuaxu aihuaxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Otherwise looks good.

return visitList(group, visitor);
} else if (LogicalTypeAnnotation.mapType().equals(annotation)) {
return visitMap(group, visitor);
} else if (LogicalTypeAnnotation.variantType((byte) 1).equals(annotation)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use Variant.VARIANT_SPEC_VERSION instead of hardcoded 1.

primitive(29, "v", PrimitiveTypeName.INT32, Repetition.REQUIRED)),
variant(30, "variant_col_1", Repetition.OPTIONAL),
variant(null, "variant_col_2", Repetition.REQUIRED),
list(31, "list_col_6", Repetition.OPTIONAL, variant(32, "v", Repetition.OPTIONAL)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test with a variant in map?

private Type variant(Integer id, String name, Repetition repetition) {
GroupBuilder<GroupType> builder =
org.apache.parquet.schema.Types.buildGroup(repetition)
.as(org.apache.parquet.schema.LogicalTypeAnnotation.variantType((byte) 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Use Variant.VARIANT_SPEC_VERSION instead of 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants