-
Notifications
You must be signed in to change notification settings - Fork 28.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-43415][CONNECT][SQL] Implement
KVGDS.agg
with custom `mapVal…
…ues` function ### What changes were proposed in this pull request? This PR implements `KVGDS.agg(typedColumn)` function when there is a `mapValues` function defined. This use case was previously unsupported (`mapValues` won't be applied). This PR marks the special handling of `kvds.reduce()` obsolete. However, we keep the server-side code to maintain compatibility with older clients. This implementation is purely done on the client side, oblivious to the Connect server. The mechanism is to first create an intermediate DF that contains only two Struct columns: ``` df |- iv: struct<...schema of the original df...> |- v: struct<...schema of the output of the mapValues func...> ``` Then we re-write all grouping exprs to use `iv` column, and all aggregating exprs to use `v` column as input. The rule is as follows: - Prefix every column reference with `iv` or `v`, e.g., `col1` becomes `iv.col1`. - Rewrite `*` to - `iv.value`, if the original df schema is a primitive type; or - `iv`, if the original df schema is a struct type. Follow-up: - [SPARK-50837](https://issues.apache.org/jira/browse/SPARK-50837): fix wrong output column names. This issue is caused by us manipulating DF schema. - [SPARK-50846](https://issues.apache.org/jira/browse/SPARK-50846): consolidate aggregator-to-proto transformation code path. ### Why are the changes needed? To support a use case that is previously unsupported. ### Does this PR introduce _any_ user-facing change? Yes, see the first section. ### How was this patch tested? New test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49111 from xupefei/kvds-mapvalues. Authored-by: Paddy Xu <[email protected]> Signed-off-by: Herman van Hovell <[email protected]>
- Loading branch information
1 parent
aefaa66
commit b968ce1
Showing
7 changed files
with
318 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.