Skip to content

[SPARK-52444][SQL][CONNECT] Add support for Variant/Char/Varchar Literal #51215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dengziming
Copy link
Member

What changes were proposed in this pull request?

We are currently missing support for a number of DataTypes, add Variant/Char/Varchar in this PR,
Strings with Collation, YearMonthIntervalType with begin/end and DayTimeIntervalType with begin/end require us to change the expression structure and will be done in another PR.

Why are the changes needed?

Align the gap between literal expressions and data types.

Does this PR introduce any user-facing change?

Yes.

  1. API function fun.lit() can pass a VariantVal.
  2. Connect Literal expressions can pass Variant/Char/Varchar.

How was this patch tested?

  1. For Variant, add test in tests of fun.lit and fun.typedLit.
  2. For Char/Varchar that can't be tested by existing tests, add new tests in SparkConnectPlannerSuite.scala

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon HyukjinKwon changed the title [SPARK-52444][SQL][Connect] Add support for Variant/Char/Varchar Literal [SPARK-52444][SQL][CONNECT] Add support for Variant/Char/Varchar Literal Jun 23, 2025
@dengziming
Copy link
Member Author

ping @hvanhovell to take a look.


message Char {
string value = 1;
optional int32 length = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed when the length of the value and the intended data type do not match right? If so please document this.

Copy link
Member Author

@dengziming dengziming Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when using Char or Varchar, the length field is better to provide, or String type is preferred. We will use the length field if it's provided and do some validation, or use len(value) as a default when length is omitted. I added these descriptions to the documents.

@@ -240,6 +243,21 @@ message Expression {
Strings strings = 6;
}
}

message Variant {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to provide a reference to the format that is used here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a reference to Spark's VariantVal, and I searched the docs directory without finding expressions-related docs, so I changed no documents in the docs directory.

Copy link
Member Author

@dengziming dengziming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell Thank you for your review, I'm sorry that I forgot to submit my reply, and I have resolved your comments, PTAL again.

@@ -240,6 +243,21 @@ message Expression {
Strings strings = 6;
}
}

message Variant {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a reference to Spark's VariantVal, and I searched the docs directory without finding expressions-related docs, so I changed no documents in the docs directory.


message Char {
string value = 1;
optional int32 length = 2;
Copy link
Member Author

@dengziming dengziming Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when using Char or Varchar, the length field is better to provide, or String type is preferred. We will use the length field if it's provided and do some validation, or use len(value) as a default when length is omitted. I added these descriptions to the documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants