Skip to content

Perform type coercion for corr aggregate function during physical planning #15776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kumarlokesh
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • Created a new utility function apply_aggregate_coercion that applies type
    coercion during physical planning based on the logical type coercion rules
  • Modified the physical planner to use this utility when creating aggregate expressions
  • Removed explicit type casting from the CorrelationGroupsAccumulator.update_batch method

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Apr 19, 2025
@kumarlokesh kumarlokesh force-pushed the avoid-explicit-cast-in-corr-aggregate-fn branch from 41587c4 to 1933efe Compare April 19, 2025 16:44
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Jun 20, 2025
@kumarlokesh
Copy link
Contributor Author

PR is active.

@@ -1598,8 +1600,16 @@ pub fn create_aggregate_expr_with_name_and_maybe_filter(
physical_name(e)?
};

let physical_args =
// Create base physical expressions (without coercion)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think typically coercsion is applied ealier in the planning proesses, not after physiscal expressions have been crated

Did you consider applying the coercsion along with the other coercion ?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @kumarlokesh 🙏

let array_x = downcast_array::<Float64Array>(array_x);
let array_y = &cast(&values[1], &DataType::Float64)?;
let array_y = downcast_array::<Float64Array>(array_y);
let array_x = downcast_array::<Float64Array>(&values[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like this code only handles Float64 but the signature of the function reports that it accepts any numeric type:

impl Correlation {
    /// Create a new COVAR_POP aggregate function
    pub fn new() -> Self {
        Self {
            signature: Signature::uniform(2, NUMERICS.to_vec(), Volatility::Immutable),
        }
    }
}

I wonder if you just changed the signature to say the function needs Float64 argument types, woudl that be enough?

DataFusion already has a bunch of coercion rules, see https://docs.rs/datafusion/latest/datafusion/logical_expr/type_coercion/index.html for example

@alamb
Copy link
Contributor

alamb commented Jun 22, 2025

I am sorry for the delayed review

@alamb
Copy link
Contributor

alamb commented Jun 23, 2025

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft June 23, 2025 20:15
@github-actions github-actions bot removed the Stale PR has not had any activity for some time label Jun 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate functions Changes to functions implementation logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid explicit cast during execution in corr aggregate function
2 participants