-
Notifications
You must be signed in to change notification settings - Fork 653
fix(parquet-source): recursively check the schema of nested data types #22301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR only adds schema checks. For user-defined columns, if they do not match the Parquet schema
a null value will currently be filled in. In the future, we may read the metadata of a Parquet file when creating a source/table to inform the user in advance about any schema mismatches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
The previous implementation only performed schema checks on the outermost layer of nested data types, which could lead to mismatches in the inner schema.
e.g
struct<f1: Double, f2: Utf8>
should not matchrw_struct<f1: decimal, f2: Utf8>
This PR adds recursive validation for nested types and includes unit tests.
Checklist
Documentation
Release note