-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute from_json
with struct schema using JSONUtils.fromJSONToStructs
#11618
Open
ttnghia
wants to merge
26
commits into
NVIDIA:branch-24.12
Choose a base branch
from
ttnghia:from_json_post_processing
base: branch-24.12
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+85
−288
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
ttnghia
added
feature request
New feature or request
SQL
part of the SQL/Dataframe plugin
performance
A performance related task/issue
P0
Must have for release
task
Work required that improves the product but is not user facing
labels
Oct 17, 2024
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
ttnghia
force-pushed
the
from_json_post_processing
branch
from
October 18, 2024 18:17
8ec6474
to
692a0cb
Compare
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
This reverts commit b3dcffc.
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
revans2
reviewed
Nov 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is still in draft so the debugging comments and commented out code is fine. I just thought I would track it anyways. It looks great.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonReadCommon.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonToStructs.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
ttnghia
force-pushed
the
from_json_post_processing
branch
from
November 14, 2024 05:02
cbf4499
to
e2f1724
Compare
ttnghia
changed the title
Perform conversion for the columns output from
Execute Nov 14, 2024
Table.readJSON
to other data types using JSONUtils.convertDataTypes()
from_json
with struct schema using JSONUtils.fromJSONToStructs
revans2
previously approved these changes
Nov 14, 2024
# Conflicts: # sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuJsonToStructs.scala
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature request
New feature or request
P0
Must have for release
performance
A performance related task/issue
SQL
part of the SQL/Dataframe plugin
task
Work required that improves the product but is not user facing
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adopts the newly implemented JNI function
JSONUtils.fromJSONToStructs()
to parse the input strings columns into a structs column, which is the case of callingfrom_json
SQL function with struct schema. By replacing the Scala code entirely by native code, we can avoid a lot of overhead and optimize runtime performance.Closes #11560.
This will also close the following issues:
Depends on:
from_json_to_structs
spark-rapids-jni#2510