-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51152][Python][SQL] Add usage examples for the get_json_object function #49875
base: master
Are you sure you want to change the base?
Conversation
Please help review it when you have free time, thanks! @panbingkun |
Are there any other examples of |
No, I have added a more comprehensive example. The PySpark example has been updated. |
|
||
Example2: Get json object from json array object | ||
|
||
data = [("1", '''[{"f1": "value1", "f2": "value2"},{"f1": "value3", "f2": "value4"}]'''), ("2", '''[{"f1": "value12"},{"f1": "value13"}]''')] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also fix the indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,done~
@@ -42,6 +42,10 @@ import org.apache.spark.unsafe.types.UTF8String | |||
Examples: | |||
> SELECT _FUNC_('{"a":"b"}', '$.a'); | |||
b | |||
> SELECT _FUNC_('[{"a":"b"},{"a":"c"}]', '$[0].a'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format the JSON of the example to make it look more pretty and formal, eg:
SELECT FUNC('[{"a": "b"}, {"a": "c"}]', '$[0].a');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I referred to the native syntax of get_json_object.
@@ -20115,11 +20115,24 @@ def get_json_object(col: "ColumnOrName", path: str) -> Column: | |||
|
|||
Examples | |||
-------- | |||
Example1: Get json object from json object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please format it, eg:
Example 1: ...
Also, the title Get json object from json object
looks weird.
Can we give it a more suitable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done~
>>> data = [("1", '''{"f1": "value1", "f2": "value2"}'''), ("2", '''{"f1": "value12"}''')] | ||
>>> df = spark.createDataFrame(data, ("key", "jstring")) | ||
>>> df.select(df.key, get_json_object(df.jstring, '$.f1').alias("c0"), \\ | ||
... get_json_object(df.jstring, '$.f2').alias("c1") ).collect() | ||
[Row(key='1', c0='value1', c1='value2'), Row(key='2', c0='value12', c1=None)] | ||
|
||
Example2: Get json object from json array object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done~
What changes were proposed in this pull request?
The pr aims to add some usage examples for function
get_json_object
, including:get_json_object('[{"a":"b"},{"a":"c"}]', '$[0].a')
,get_json_object('[{"a":"b"},{"a":"c"}]', '$[*].a')
.Why are the changes needed?
Most users are unaware of how to retrieve the corresponding JSON object using get_json_object when the JSON object is of a JSON array type.
Before this PR, the example:
After this PR, the example of get_json_object has been changed to look like this:
Does this PR introduce any user-facing change?
Yes, added a more intuitive example to the get_json_object function when users visit the https://spark.apache.org/docs/latest/api/sql/#get_json_object webpage and https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.get_json_object.html webpage.
How was this patch tested?
Pass GA & Manually Test
Was this patch authored or co-authored using generative AI tooling?
No