-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29089: Fix MSCK Repair table adding invalid partitions for non string partitions columns #5976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
4ceb6d9
to
cf1cdf0
Compare
...astore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Outdated
Show resolved
Hide resolved
...astore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Outdated
Show resolved
Hide resolved
...astore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Outdated
Show resolved
Hide resolved
...astore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Outdated
Show resolved
Hide resolved
// Decimal datatypes are stored like decimal(10,10) | ||
return new BigDecimal(partitionValue).stripTrailingZeros().toPlainString(); | ||
try { | ||
switch (type.toLowerCase()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe ColumnType.getTypeName(type)
? this would handle the type.startsWith("decimal")
@@ -90,7 +90,7 @@ PREHOOK: Output: default@tbl_y | |||
POSTHOOK: query: MSCK REPAIR TABLE tbl_y | |||
POSTHOOK: type: MSCK | |||
POSTHOOK: Output: default@tbl_y | |||
Partitions not in metastore: tbl_y:month=12/day=2 tbl_y:month=12/day=3 tbl_y:month=12/day=__HIVE_DEFAULT_PARTITION__ tbl_y:month=ANOTHER_PARTITION/day=3 | |||
Partitions not in metastore: tbl_y:month=12/day=2 tbl_y:month=12/day=3 tbl_y:month=ANOTHER_PARTITION/day=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why tbl_y:month=12/day=__HIVE_DEFAULT_PARTITION__
is gone?
@@ -100,8 +100,40 @@ POSTHOOK: type: SHOWPARTITIONS | |||
POSTHOOK: Input: default@tbl_y | |||
month=12/day=2 | |||
month=12/day=3 | |||
month=12/day=__HIVE_DEFAULT_PARTITION__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
// Skip this partition if there is some issue in the partition validation | ||
LOG.warn("Skipping partition : " + partPath.getName()); | ||
continue; | ||
} | ||
LOG.debug("PartitionName: " + partitionName); | ||
|
||
if (partitionName != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why check for null, if we skip the iteration when partitionName == null
?
if (!NumberUtils.isParsable(partitionValue)) { | ||
public static String getNormalisedPartitionValue(String partitionValue, String type, | ||
Configuration conf) { | ||
if ((!NumberUtils.isParsable(partitionValue) && !Objects.equals(type, ColumnType.STRING_TYPE_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand this change:
if the partitionValue is not a number but type is String, we'll attempt to parse the number in partitionValue - doesn't make any sense
DEFAULTPARTITIONNAME only applies to String type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it like
public static String getNormalisedPartitionValue(String partitionValue, String type, Configuration conf) {
// 1. Handle simple exit cases first.
if (type == null) {
return partitionValue;
}
if (Objects.equals(partitionValue, MetastoreConf.getVar(conf,
MetastoreConf.ConfVars.DEFAULTPARTITIONNAME))) {
// This is the special partition name for NULL values. It should never be parsed.
return partitionValue;
}
// 2. If the type is not numeric, no normalization is needed.
String colType = ColumnType.getTypeName(type);
if (!ColumnType.NumericTypes.contains(colType)) {
return partitionValue;
}
// 3. At this point, we have a numeric type that needs normalization.
LOG.debug("Normalizing partition value '{}' for type '{}'.", partitionValue, type);
try {
switch (colType) {
case ColumnType.TINYINT_TYPE_NAME:
case ColumnType.SMALLINT_TYPE_NAME:
case ColumnType.INT_TYPE_NAME:
return Integer.toString(Integer.parseInt(partitionValue));
case ColumnType.BIGINT_TYPE_NAME:
return Long.toString(Long.parseLong(partitionValue));
case ColumnType.FLOAT_TYPE_NAME:
return Float.toString(Float.parseFloat(partitionValue));
case ColumnType.DOUBLE_TYPE_NAME:
return Double.toString(Double.parseDouble(partitionValue));
case ColumnType.DECIMAL_TYPE_NAME:
return new BigDecimal(partitionValue).stripTrailingZeros().toPlainString();
}
} catch (NumberFormatException e) {
// 4. Handle cases where the value cannot be parsed as the expected number type.
String validationMode = MetastoreConf.getVar(conf, MetastoreConf.ConfVars.MSCK_PATH_VALIDATION);
if ("throw".equals(validationMode)) {
LOG.error("Invalid partition value: Cannot parse '{}' as type '{}'. Failing MSCK. "
+ "Set hive.msck.path.validation=skip to ignore invalid partitions.", partitionValue, type);
throw e;
} else if ("skip".equals(validationMode)) {
LOG.warn("Skipping invalid partition value '{}' for type '{}' due to parsing error.", partitionValue, type);
// Signals the caller to skip this partition.
return null;
}
}
return partitionValue;
}
@vikramahuja1001, please check the comments |
|
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?