Skip to content

feat(reader): handle field ID conflicts in RecordBatchTransformer#1804

Closed
mbutrovich wants to merge 2 commits intoapache:mainfrom
mbutrovich:field_id_conflicts
Closed

feat(reader): handle field ID conflicts in RecordBatchTransformer#1804
mbutrovich wants to merge 2 commits intoapache:mainfrom
mbutrovich:field_id_conflicts

Conversation

@mbutrovich
Copy link
Collaborator

Which issue does this PR close?

    /// This reproduces the scenario from Iceberg Java's TestAddFilesProcedure where:
    /// - Hive-style partitioned Parquet files are imported via add_files procedure
    /// - Parquet files have field IDs: name (1), subdept (2)
    /// - Iceberg schema assigns different field IDs: id (1), name (2), dept (3), subdept (4)
    /// - Partition columns (id, dept) have initial_default values from manifests
    ///
    /// Without proper handling, this would incorrectly:
    /// 1. Try to read partition column "id" (field_id=1) from Parquet field_id=1 ("name")
    /// 2. Read data column "name" (field_id=2) from Parquet field_id=2 ("subdept")
    ///
    /// The fix ensures:
    /// 1. Partition columns with initial_default are ALWAYS read as constants (never from Parquet)
    /// 2. Data columns use name-based mapping when field ID conflicts are detected

What changes are included in this PR?

  • Detect conflict in field ID mappings and resolve similar to Iceberg Java BaseParquetReaders.java PartitionUtil.constantsMap()

Are these changes tested?

@mbutrovich mbutrovich marked this pull request as draft October 30, 2025 01:37
@mbutrovich
Copy link
Collaborator Author

Draft while I review some new Iceberg Java failures this created for me.

@mbutrovich
Copy link
Collaborator Author

I think I'll close this in favor of a more comprehensive fix that handles partition specs correctly.

@mbutrovich mbutrovich closed this Oct 30, 2025
@mbutrovich mbutrovich deleted the field_id_conflicts branch November 3, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant