-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Support decimal for variance #18937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the changes are doing what you expect here; I believe decimals are still being converted to float for these calculations. What we are looking to do is to perform the calculations natively on decimals without casting them to floats. The function already supports decimal inputs by casting them to floats.
Thanks for pointing that. I just checked out the PR #1408 for the guidance and updated the code. I also used LLM for this.(disclosure) |
|
One of the key requirements that we want here is that decimal inputs are not cast to floats, and the return type remains decimal. I think if you check the tests introduced here you'll see that is not the case currently, hence the code for native decimal implementation isn't actually being used. |
| let decimal_array = builder.finish().with_precision_and_scale(10, 3).unwrap(); | ||
| let array: ArrayRef = Arc::new(decimal_array); | ||
|
|
||
| let mut pop_acc = VarianceAccumulator::try_new(StatsType::Population)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this use DecimalVarianceAccumulator ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should, I made a mistake, there are a few more I will fix it soon.
| pop_acc.update_batch(&pop_input)?; | ||
| assert_variance(pop_acc.evaluate()?, 11025.9450285); | ||
|
|
||
| let mut sample_acc = VarianceAccumulator::try_new(StatsType::Sample)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too
| T: DecimalType + ArrowNumericType + Debug, | ||
| T::Native: DecimalNative, | ||
| { | ||
| fn new(scale: i8, stats_type: StatsType) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why here the scale is not checked (scale > DECIMAL256_MAX_SCALE) ?
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
keeping Float64 outputs, and add unit coverage that feeds decimal
batches through both population and sample accumulators
expected strings with the Float64 results emitted by the engine
Are these changes tested?
Are there any user-facing changes?