Skip to content

HIVE-29089: Fix MSCK Repair table adding invalid partitions for non string partitions columns #5976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

vikramahuja1001
Copy link
Contributor

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

@vikramahuja1001 vikramahuja1001 changed the title Fix MSCK Repair adding invalid partitions HIVE-29089 Fix MSCK Repair table adding invalid partitions Jul 15, 2025
@vikramahuja1001 vikramahuja1001 changed the title HIVE-29089 Fix MSCK Repair table adding invalid partitions HIVE-29089 Fix MSCK Repair table adding invalid partitions for non string partitions columns Jul 15, 2025
@vikramahuja1001 vikramahuja1001 changed the title HIVE-29089 Fix MSCK Repair table adding invalid partitions for non string partitions columns HIVE-29089: Fix MSCK Repair table adding invalid partitions for non string partitions columns Jul 15, 2025
// Decimal datatypes are stored like decimal(10,10)
return new BigDecimal(partitionValue).stripTrailingZeros().toPlainString();
try {
switch (type.toLowerCase()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe ColumnType.getTypeName(type)? this would handle the type.startsWith("decimal")

@@ -90,7 +90,7 @@ PREHOOK: Output: default@tbl_y
POSTHOOK: query: MSCK REPAIR TABLE tbl_y
POSTHOOK: type: MSCK
POSTHOOK: Output: default@tbl_y
Partitions not in metastore: tbl_y:month=12/day=2 tbl_y:month=12/day=3 tbl_y:month=12/day=__HIVE_DEFAULT_PARTITION__ tbl_y:month=ANOTHER_PARTITION/day=3
Partitions not in metastore: tbl_y:month=12/day=2 tbl_y:month=12/day=3 tbl_y:month=ANOTHER_PARTITION/day=3
Copy link
Member

@deniskuzZ deniskuzZ Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why tbl_y:month=12/day=__HIVE_DEFAULT_PARTITION__ is gone?

@@ -100,8 +100,40 @@ POSTHOOK: type: SHOWPARTITIONS
POSTHOOK: Input: default@tbl_y
month=12/day=2
month=12/day=3
month=12/day=__HIVE_DEFAULT_PARTITION__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

// Skip this partition if there is some issue in the partition validation
LOG.warn("Skipping partition : " + partPath.getName());
continue;
}
LOG.debug("PartitionName: " + partitionName);

if (partitionName != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why check for null, if we skip the iteration when partitionName == null?

if (!NumberUtils.isParsable(partitionValue)) {
public static String getNormalisedPartitionValue(String partitionValue, String type,
Configuration conf) {
if ((!NumberUtils.isParsable(partitionValue) && !Objects.equals(type, ColumnType.STRING_TYPE_NAME)
Copy link
Member

@deniskuzZ deniskuzZ Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand this change:

if the partitionValue is not a number but type is String, we'll attempt to parse the number in partitionValue - doesn't make any sense

DEFAULTPARTITIONNAME only applies to String type

Copy link
Member

@deniskuzZ deniskuzZ Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it like

public static String getNormalisedPartitionValue(String partitionValue, String type, Configuration conf) {
  // 1. Handle simple exit cases first.
  if (type == null) {
    return partitionValue;
  }
  if (Objects.equals(partitionValue, MetastoreConf.getVar(conf,
        MetastoreConf.ConfVars.DEFAULTPARTITIONNAME))) {
    // This is the special partition name for NULL values. It should never be parsed.
    return partitionValue;
  }
  // 2. If the type is not numeric, no normalization is needed.
  String colType = ColumnType.getTypeName(type);
  if (!ColumnType.NumericTypes.contains(colType)) {
    return partitionValue;
  }
  // 3. At this point, we have a numeric type that needs normalization.
  LOG.debug("Normalizing partition value '{}' for type '{}'.", partitionValue, type);
  try {
    switch (colType) {
      case ColumnType.TINYINT_TYPE_NAME:
      case ColumnType.SMALLINT_TYPE_NAME:
      case ColumnType.INT_TYPE_NAME:
        return Integer.toString(Integer.parseInt(partitionValue));
      case ColumnType.BIGINT_TYPE_NAME:
        return Long.toString(Long.parseLong(partitionValue));
      case ColumnType.FLOAT_TYPE_NAME:
        return Float.toString(Float.parseFloat(partitionValue));
      case ColumnType.DOUBLE_TYPE_NAME:
        return Double.toString(Double.parseDouble(partitionValue));
      case ColumnType.DECIMAL_TYPE_NAME:
        return new BigDecimal(partitionValue).stripTrailingZeros().toPlainString();
    }
  } catch (NumberFormatException e) {
    // 4. Handle cases where the value cannot be parsed as the expected number type.
    String validationMode = MetastoreConf.getVar(conf, MetastoreConf.ConfVars.MSCK_PATH_VALIDATION);
    if ("throw".equals(validationMode)) {
      LOG.error("Invalid partition value: Cannot parse '{}' as type '{}'. Failing MSCK. "
          + "Set hive.msck.path.validation=skip to ignore invalid partitions.", partitionValue, type);
      throw e;
    } else if ("skip".equals(validationMode)) {
      LOG.warn("Skipping invalid partition value '{}' for type '{}' due to parsing error.", partitionValue, type);
      // Signals the caller to skip this partition.
      return null;
    }
  }
  return partitionValue;
}

@deniskuzZ
Copy link
Member

@vikramahuja1001, please check the comments

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants