Skip to content

Conversation

@ljfgem
Copy link
Collaborator

@ljfgem ljfgem commented Feb 10, 2023

Similar to #351, this PR migrates CAST(ROW: RECORD_TYPE) and extract_union transformations from RelNode layer in IRRelToSparkRelTransformer to SqlNode layer.

In order to avoid creating a PR that is too big, I will open several more PRs to finish migrating all the transformations in IRRelToSparkRelTransformer to SqlNode layer and remove IRRelToSparkRelTransformer.

Tests:

  1. Existing unit tests, which already cover the changes
  2. Regression test
  3. Querying several complex production views on gateway, no regression

Copy link
Contributor

@aastha25 aastha25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ljfgem for the PR!
Left minor comments around documentation.



/**
* This transformer transforms `CAST(ROW: RECORD_TYPE)` to `named_struct` in Spark.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: transforms Coral IR function ..... to Spark compatible operator named_struct....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's suitable to call it Coral IR function, I think it's a function in CoralSqlNode converted from Coral IR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the path:
source SQL -> sourceSqlNode1 -> CoralSqlNode1 -> CoralRelNode -> CoralSqlNode2 -> TargetLangSqlNode -> target SQL

if an operator is present in CoralSqlNode1, CoralRelNode, CoralSqlNode2 and we transform this operator when we go to LanguageSqlNode, such operators can be called Coral IR functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per the current status of our code base -
hive SQL -> HiveSqlNode (same as CoralSqlNode1) -> CoralRelNode -> CoralSqlNode2 -> SparkSqlNode -> Spark SQL
I think this operator qualifies as Coral Operator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarification, do you mean an operator is a Coral IR function if it appears in CoralSqlNode1 / CoralRelNode / CoralSqlNode2 or CoralSqlNode1 & CoralRelNode & CoralSqlNode2?
Not sure because it doesn't appear in CoralSqlNode1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a corresponding input hive SQL, what is the operator we use in the SqlNode representation? how do we represent the named_struct on LHS?

but generally yes, an operator present in CoralSqlNode1 would be present in CoralRelNode and then generated back in CoralSqlNode2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still represented as named_struct in CoralSqlNode1, and Calcite converts it to CAST(ROW: RECORD_TYPE) in RelNode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In that case, could you investigate if it's possible to add a lightweight override in CoralRelNodeToCoralSqlNode to such that the coralSqlNode2 generated uses the same operator named_struct as coralSqlNode1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #357 for tracking.

@ljfgem
Copy link
Collaborator Author

ljfgem commented Feb 13, 2023

@aastha25 Thanks for your review! Modified these 2 comments based on your suggestions. PTAL.

@ljfgem ljfgem merged commit ef058ef into linkedin:master Feb 14, 2023
@ljfgem ljfgem deleted the coral_spark_transformation_migration branch February 14, 2023 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants