Skip to content

Conversation

@aastha25
Copy link
Contributor

@aastha25 aastha25 commented Nov 19, 2025

What changes are proposed in this pull request, and why are they necessary?

This PR completes the integration of the Coral type system (introduced in #558) by implementing the glue logic for two-stage schema conversion: Hive → Coral → Calcite. This architectural change enables better type system abstraction while maintaining 100% backward compatibility with existing behavior.

Key Changes

  1. New Type Converters
    HiveToCoralTypeConverter: Converts Hive TypeInfo → Coral data types
    CoralTypeToRelDataTypeConverter: Converts Coral data types → Calcite RelDataType
    Both converters have comprehensive unit test coverage in HiveToCoralTypeConverterTest
  2. Refactored HiveTable.getRowType()
    New path (shadow mode): getRowTypeViaCoralTypeSystem() implements Hive → Coral → Calcite pipeline
    Legacy path (production path): getRowTypeDirectConversion() preserves original TypeConverter logic
    Logging: Warns on mismatch or failure (zero production impact)
    New method: getCoralSchema() for Hive → Coral schema conversion
  3. Critical Fixes in Coral Type System for Legacy Compatibility. Called out as pending in Introduce Coral type system abstraction #558 here
    TIMESTAMP precision: Uses PRECISION_NOT_SPECIFIED (-1) to match legacy behavior (no explicit precision)
    UNION types: Generates {tag, field0, field1, ...} struct matching Trino's pattern (required by coalesce_struct UDF)
    VOID type: Maps to SqlTypeName.NULL (not STRING)
    Nullability: All types default to nullable per Hive semantics

How was this patch tested?

  • Comprehensive unit tests in HiveToCoralTypeConverterTest covering all type categories
  • Existing Integration tests pass
  • Tested against some production grade datasets

Rollout
This PR: Shadow mode - validate with zero risk
Follow up PR: Switch to Coral as primary after log validation

* Two-stage conversion: Hive → Coral → Calcite.
* This is the preferred path when using CoralCatalog.
*/
private RelDataType getRowTypeViaCoralTypeSystem(RelDataTypeFactory typeFactory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need this. I would imagine both Hive and Iceberg tables convert their types to Coral type as part as part their table API implementation. Internally we have common conversion between Coral and Calcite. IN this step, it does not matter we are coming from Hive or Iceberg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are checking in hive table -> coral -> calcite type system prior to introducing the iceberg type system to roll out the coral type system incrementally. so this intermediate state is required.

a future refactor will be applied in #556 when we introduce iceberg type system and those conversions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't the radius of impact of this be too large? I would keep current constant as much as possible and introduce new features on new things, then we can flip the flag for "existing" things.

If the objective is to validate, can we we continue to use the old implementation, but also compare in the background and log a non-fatal error when there is a mismatch? This way we will know from the log instead of from the user :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this layer of coral, or for that matter coral by itself, is unaware of "new" vs "old", so it needs to be a switch at some point. Building out awareness of "new features" in coral seems like a good product investment in general, but let's pursue that outside of this PR. These changes will get rolled out with #556, Im trying to establish a checkpoint in the middle for closer monitoring.

We're rolling out this change with a fallback, regression testing across all code paths and i-tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by Coral is not aware of "new" vs "old". The PR's objective is to introduce "old" and "new", so both are known to Coral. Here is what I meant.

public RelDataType getRowType(RelDataTypeFactory typeFactory) {
  RelDataType hiveType = getRowTypeFromHiveType(typeFactory);

  try {
    RelDataType coralType = getRowTypeFromCoralType(typeFactory);
    if (!hiveType.equals(coralType)) {
      LOG.warn(
          "Hive and Coral type conversion mismatch for table {}. " +
          "Hive: {}, Coral: {}",
          hiveTable.getTableName(),
          hiveType,
          coralType
      );
      // Optional: throw a non-fatal exception
      throw new RuntimeException(
          "Type conversion mismatch between Hive and Coral for table " +
          hiveTable.getTableName()
      );
    }
  } catch (Exception e) {
    LOG.warn(
        "Coral validation failed for table {}. Proceeding with Hive type. Error: {}",
        hiveTable.getTableName(),
        e.getMessage(),
        e
    );
  }

  return hiveType; // Always return Hive path
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can accommodate this change. just one additional clarification, how are we thinking of accommodating this approach with HiveCoralTable & IcebergCoralTable with #556 ?

HiveCoralTable implements CoralTable {

public CoralSchema getSchema() {
// integrate coral type system
}

and internally in HiveTable.java, for hive tables, we use the same approach - prefer getRowTypeFromHiveType() over getRowTypeFromCoralType() . but for iceberg tables, we prefer getRowTypeFromCoralType() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us discuss the strategy offline and update here. There are a few options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this PR, PTAL

@aastha25 aastha25 merged commit 6950c4b into linkedin:master Nov 21, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants