-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: increase ROUND decimal precision to prevent overflow truncation #19926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Other options considered:
I chose option 1 as the most user friendly, I would like to hear if there are other suggestion for this fix. |
Option 1 sounds good. Option 2 would be too strict, not sure I fully understand what option 3 means here. One thing to keep in mind is we still need to consider edge case where precision is already maxed out. Would this bug occur again? |
You were right to point that, i did some more test and found out it was overflowing after 38 digits, I handled that as an error. |
|
|
||
| // Validate the result fits within the precision | ||
| // The max value for a given precision is 10^precision - 1 | ||
| let max_value = ten.pow_checked(precision as u32).map_err(|_| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use these constants instead
- https://docs.rs/arrow/latest/arrow/datatypes/constant.MAX_DECIMAL128_FOR_EACH_PRECISION.html
- https://docs.rs/arrow/latest/arrow/datatypes/constant.MIN_DECIMAL128_FOR_EACH_PRECISION.html
(Same constants exist for each decimal type)
| round('1234.5678'::decimal(50,4), 2); | ||
| ---- | ||
| Decimal256(50, 4) 1234.57 | ||
| Decimal256(51, 4) 1234.57 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to ask whether the increase of the precision could be done only when really needed.
And I went checking what the other DBs do:
- Postgres v18
postgres=# SELECT pg_typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1));
pg_typeof | round
-----------+-------
numeric | 1000
(1 row)
(Not helpful)
- Apache Spark (v4.0.1)
spark-sql (default)> SELECT typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1));
decimal(4,0) 1000
Time taken: 0.032 seconds, Fetched 1 row(s)
Reduced the scale!
- DuckDB
SELECT typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1));
┌────────────────────────────────────────────┬────────────────────────────────────┐
│ typeof(round(CAST(999.9 AS DECIMAL(4,1)))) │ round(CAST(999.9 AS DECIMAL(4,1))) │
│ varchar │ decimal(4,0) │
├────────────────────────────────────────────┼────────────────────────────────────┤
│ DECIMAL(4,0) │ 1000 │
└────────────────────────────────────────────┴────────────────────────────────────┘
Reduced the scale!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice, Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Case | Input | Result | Type |
|---|---|---|---|
| dp column | 123.4567, dp=1 | 123.5000 | DECIMAL(10,4) |
| carry-over | 999.9999, dp=0 | 1000.0000 | DECIMAL(10,4) |
| constant dp | 999.9 | 1000 | DECIMAL(4,0) |
| scale reduction | 1234.5678, dp=2 | 1234.57 | DECIMAL(50,2) |
| negative dp | 12345.67, dp=-2 | 12300 | DECIMAL(7,0) |
| dp > scale | 123.4, dp=5 | 123.4 | DECIMAL(4,1) |
| negative values | -999.9, dp=0 | -1000 | DECIMAL(4,0) |
| // Get decimal_places from scalar_arguments | ||
| // If dp is not a constant scalar, we must keep the original scale because | ||
| // we can't determine a single output scale for varying per-row dp values. | ||
| let (decimal_places, dp_is_scalar): (i32, bool) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here decimal_places is initialized as i32 but below it is casted to i8 with as i8 and this may lead to problems. A validation is needed that it is not bigger than i8::MAX
| Ok(Arc::new(Field::new( | ||
| self.name(), | ||
| return_type, | ||
| input_field.is_nullable(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also take into account the decimal_places arg.
Postgres:
postgres=# SELECT pg_typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1), NULL);
pg_typeof | round
-----------+-------
numeric |
(1 row)
Apache Spark:
spark-sql (default)> SELECT typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1), NULL);
decimal(4,0) NULL
Time taken: 0.055 seconds, Fetched 1 row(s)
DuckDB:
D SELECT typeof(round(999.9::DECIMAL(4,1))), round(999.9::DECIMAL(4,1), NULL);
┌────────────────────────────────────────────┬──────────────────────────────────────────┐
│ typeof(round(CAST(999.9 AS DECIMAL(4,1)))) │ round(CAST(999.9 AS DECIMAL(4,1)), NULL) │
│ varchar │ int32 │
├────────────────────────────────────────────┼──────────────────────────────────────────┤
│ DECIMAL(4,0) │ NULL │
└────────────────────────────────────────────┴──────────────────────────────────────────┘
| if args.scalar_arguments.len() > 1 { | ||
| match args.scalar_arguments[1] { | ||
| Some(ScalarValue::Int32(Some(v))) => (*v, true), | ||
| Some(ScalarValue::Int64(Some(v))) => (*v as i32, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use a safe cast and return an Err if the number is bigger than i32::MAX (or even i8::MAX)
| decimal_places: Option<ArrayRef>, | ||
| ) -> Result<ArrayRef, DataFusionError> { | ||
| let number_rows = value.len(); | ||
| let return_type = value.data_type().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be wrong for decimals which scale is reduced.
No need to fix/improve it. Just a comment acknowledging that is enough.
Which issue does this PR close?
rounddecimal incorrect result #19921.Rationale for this change
SELECT ROUND('999.9'::DECIMAL(4,1))incorrectly returned100.0instead of1000.When rounding a decimal value causes a carry-over that increases the number of digits (e.g., 999.9 → 1000.0), the result overflows the original precision constraint. The overflow was silently truncated during display, producing incorrect results.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
ROUND('999.9'::DECIMAL(4,1))100.0(wrong)1000(correct)Decimal128(4, 1)Decimal128(5, 1)The return type precision is now increased by 1 to prevent overflow. This matches PostgreSQL behavior where
ROUNDcan increase the number of digits before the decimal point when carry-over occurs.Users who depend on exact output precision may need to explicitly cast: