Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute from_json with struct schema using JSONUtils.fromJSONToStructs #11618

Open
wants to merge 26 commits into
base: branch-24.12
Choose a base branch
from

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Oct 17, 2024

This adopts the newly implemented JNI function JSONUtils.fromJSONToStructs() to parse the input strings columns into a structs column, which is the case of calling from_json SQL function with struct schema. By replacing the Scala code entirely by native code, we can avoid a lot of overhead and optimize runtime performance.

Closes #11560.

This will also close the following issues:

Depends on:

@ttnghia ttnghia added feature request New feature or request SQL part of the SQL/Dataframe plugin performance A performance related task/issue P0 Must have for release task Work required that improves the product but is not user facing labels Oct 17, 2024
@ttnghia ttnghia requested a review from revans2 October 17, 2024 03:14
@ttnghia ttnghia self-assigned this Oct 17, 2024
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is still in draft so the debugging comments and commented out code is fine. I just thought I would track it anyways. It looks great.

Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia changed the title Perform conversion for the columns output from Table.readJSON to other data types using JSONUtils.convertDataTypes() Execute from_json with struct schema using JSONUtils.fromJSONToStructs Nov 14, 2024
@ttnghia ttnghia marked this pull request as ready for review November 14, 2024 05:09
revans2
revans2 previously approved these changes Nov 14, 2024
# Conflicts:
#	sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonToStructs.scala
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request P0 Must have for release performance A performance related task/issue SQL part of the SQL/Dataframe plugin task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Improve GpuJsonToStructs performance
2 participants