Skip to content

[WIP] Fix inconsistencies and refactor primitive types in parser #51335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mihailom-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes a change in how our parser treats datatypes. We introduce types with/without parameters and group accordingly.

Why are the changes needed?

Changes are needed for many reasons:

  1. Context of primitiveDataType is constantly getting bigger. This is not a good practice, as we have many null fields which only take up memory.
  2. We have inconsistencies in where we use each type. We get TIMESTAMP_NTZ in a separate rule, but we also mention it in primitive types.
  3. Primitive types should stay related to primitive types, adding ARRAY, STRUCT, MAP in the rule just because it is convenient is not good practice.
  4. Current structure does not give option of extending types with different features. For example, we introduced STRING collations, but what if we were to introduce CHAR/VARCHAR with collations. Current structure gives us 0 possibility of making a type CHAR(5) COLLATE UTF8_BINARY (We can only do CHAR COLLATE UTF8_BINARY (5)).

Does this PR introduce any user-facing change?

No. This is internal refactoring.

How was this patch tested?

All existing tests should pass, this is just code refactoring.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jul 1, 2025
@mihailom-db
Copy link
Contributor Author

@MaxGekk @cloud-fan Could you take a look at this PR? We should aim to keep our parser in the most readable and most efficient state.

primitiveType
: primitiveTypeWithParameters
| primitiveTypeWithoutParameters
| unsupportedType=identifier (LEFT_PAREN INTEGER_VALUE(COMMA INTEGER_VALUE)* RIGHT_PAREN)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we get an unsupportedType with a suffix like: TIME WITH TIME ZONE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this potentially can make a problem. I mean, we would probably return a bad error message. Let me think if we can scope issues like this as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually error message will stay the same, even previously we would return syntax error. The only thing here is if we want to improve the error messages a bit further?

@@ -1340,7 +1340,20 @@ collateClause
: COLLATE collationName=multipartIdentifier
;

type
primitiveTypeWithParameters
: STRING collateClause?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be primitiveTypeWithoutParameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can go, but in this case I would say collation is a parameter as well. It can change it's value to some different value not known at parsing time. If we follow this case, then probably INTERVAL should go to primitiveTypeWithoutParameters, as it is actually without parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By not known at parsing time, I mean identifier/arbitrary value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants