Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51152][Python][SQL] Add usage examples for the get_json_object function #49875

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

fusheng9399
Copy link

@fusheng9399 fusheng9399 commented Feb 11, 2025

What changes were proposed in this pull request?

The pr aims to add some usage examples for function get_json_object, including: get_json_object('[{"a":"b"},{"a":"c"}]', '$[0].a')get_json_object('[{"a":"b"},{"a":"c"}]', '$[*].a').

Why are the changes needed?

Most users are unaware of how to retrieve the corresponding JSON object using get_json_object when the JSON object is of a JSON array type.
Before this PR, the example:

SELECT get_json_object('{"a":"b"}', '$.a');
+-------------------------------+
|get_json_object({"a":"b"}, $.a)|
+-------------------------------+
|                              b|
+-------------------------------+

After this PR, the example of get_json_object has been changed to look like this:

SELECT get_json_object('{"a":"b"}', '$.a');
+-------------------------------+
|get_json_object({"a":"b"}, $.a)|
+-------------------------------+
|                              b|
+-------------------------------+

SELECT get_json_object('[{"a":"b"},{"a":"c"}]', '$[0].a');
+----------------------------------------------+
|get_json_object([{"a":"b"},{"a":"c"}], $[0].a)|
+----------------------------------------------+
|                                             b|
+----------------------------------------------+

SELECT get_json_object('[{"a":"b"},{"a":"c"}]', '$[*].a');
+----------------------------------------------+
|get_json_object([{"a":"b"},{"a":"c"}], $[*].a)|
+----------------------------------------------+
|                                     ["b","c"]|
+----------------------------------------------+

Does this PR introduce any user-facing change?

Yes, added a more intuitive example to the get_json_object function when users visit the https://spark.apache.org/docs/latest/api/sql/#get_json_object webpage and https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.get_json_object.html webpage.

How was this patch tested?

Pass GA & Manually Test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Feb 11, 2025
@fusheng9399 fusheng9399 changed the title Add an example for get_json_object when the JSON object is of JSON array type [SPARK-51152][SQL]Add an example for get_json_object when the JSON object is of JSON array type Feb 11, 2025
@fusheng9399
Copy link
Author

Please help review it when you have free time, thanks! @panbingkun

@panbingkun
Copy link
Contributor

Are there any other examples of wide characters that need to be shown?
Additionally, the PySpark example also needs to be updated.

@fusheng9399
Copy link
Author

fusheng9399 commented Feb 11, 2025

No, I have added a more comprehensive example. The PySpark example has been updated.


Example2: Get json object from json array object

data = [("1", '''[{"f1": "value1", "f2": "value2"},{"f1": "value3", "f2": "value4"}]'''), ("2", '''[{"f1": "value12"},{"f1": "value13"}]''')]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also fix the indentation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,done~

@@ -42,6 +42,10 @@ import org.apache.spark.unsafe.types.UTF8String
Examples:
> SELECT _FUNC_('{"a":"b"}', '$.a');
b
> SELECT _FUNC_('[{"a":"b"},{"a":"c"}]', '$[0].a');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the JSON of the example to make it look more pretty and formal, eg:

SELECT FUNC('[{"a": "b"}, {"a": "c"}]', '$[0].a');

Copy link
Author

@fusheng9399 fusheng9399 Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referred to the native syntax of get_json_object.

@@ -20115,11 +20115,24 @@ def get_json_object(col: "ColumnOrName", path: str) -> Column:

Examples
--------
Example1: Get json object from json object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format it, eg:
Example 1: ...

Also, the title Get json object from json object looks weird.
Can we give it a more suitable name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done~

>>> data = [("1", '''{"f1": "value1", "f2": "value2"}'''), ("2", '''{"f1": "value12"}''')]
>>> df = spark.createDataFrame(data, ("key", "jstring"))
>>> df.select(df.key, get_json_object(df.jstring, '$.f1').alias("c0"), \\
... get_json_object(df.jstring, '$.f2').alias("c1") ).collect()
[Row(key='1', c0='value1', c1='value2'), Row(key='2', c0='value12', c1=None)]

Example2: Get json object from json array object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done~

@fusheng9399 fusheng9399 changed the title [SPARK-51152][SQL]Add an example for get_json_object when the JSON object is of JSON array type [SPARK-51152][SQL]Add a more intuitive example for get_json_object Feb 12, 2025
@fusheng9399 fusheng9399 changed the title [SPARK-51152][SQL]Add a more intuitive example for get_json_object [SPARK-51152][SQL] Add richer examples for the get_json_object function Feb 12, 2025
@fusheng9399 fusheng9399 changed the title [SPARK-51152][SQL] Add richer examples for the get_json_object function [SPARK-51152][Python][SQL] Add richer examples for the get_json_object function Feb 14, 2025
@fusheng9399 fusheng9399 changed the title [SPARK-51152][Python][SQL] Add richer examples for the get_json_object function [SPARK-51152][Python][SQL] Add usage examples for the get_json_object function Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants