Widen type promotion for decimals with larger scale in Parquet Read [databricks] #11727
+117
−64
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contributes to #11433 and contributes to #11512
This PR supports additional type promotion to decimals with larger precision and scale.
As long as the precision increases by at least as much as the scale, the decimal values can be promoted without loss of precision.
A similar change is added in Apache Spark-4.0 version - apache/spark#44513
Currently, the code throws an Exception if the scale of read schema is not as same as the schema that was written for all versions previous to Spark-4.0 on CPU. This fix is available for all versions in spark-rapids.
We have removed separate checks for the decimal if they can be read as int, long and byte_array and consolidated into one function
canReadAsDecimal
. Added integration test to verify that the conditions of the type promotions are met.