-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] feat: Introduce TableConfiguration
to jointly manage metadata, protocol, and column mapping
#644
base: main
Are you sure you want to change the base?
Conversation
3f74148
to
a6d74d6
Compare
kernel/src/table_configuration.rs
Outdated
/// See the documentation of [`TableChanges`] for more details. | ||
/// | ||
/// [`TableChanges`]: crate::table_changes::TableChanges | ||
pub fn can_read_cdf(&self) -> DeltaResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zachschuermann @nicklan wdyt of this sort of API? The general pattern would be can_(read | write)_{table_feature}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we'd only create such functions if/when we need them, and if they make sense. For example if there is a feature that is only relevant to readers, there would be no can_write_{tablefeature}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason it doesn't return DeltaResult<bool>
? Seems like there are cases (most of them?) where we don't want to necessarily throw, we just want to learn (but if something goes wrong while trying to learn, an error response still warranted).
Can always add a corresponding assert_can_XXX
method if the error version is actually needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using the pattern of Ok(())
=> true, Err(_)
=> false, with an explanation why. We use a similar pattern here. Do you think we should rethink this approach?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #644 +/- ##
==========================================
- Coverage 83.67% 83.41% -0.27%
==========================================
Files 75 76 +1
Lines 16950 17055 +105
Branches 16950 17055 +105
==========================================
+ Hits 14183 14226 +43
- Misses 2100 2163 +63
+ Partials 667 666 -1 ☔ View full report in Codecov by Sentry. |
impl TableConfiguration { | ||
pub fn new(metadata: Metadata, protocol: Protocol) -> DeltaResult<Self> { | ||
// important! before a read/write to the table we must check it is supported | ||
protocol.ensure_read_supported()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this and validate_schema_column_mapping
belong here?
What changes are proposed in this pull request?
This PR introduces the
TableConfiguration
struct which is used to perform feature enablement checks on both the table's protocol and metadata.Problem statement
To check that a feature is enabled, you often must check that a certain reader/writer feature is enabled and that a table property is set to true. For example, a writer must check both the
delta.enableDeletionVectors
table property, and check that thedeletionVectors
writer feature is present in the table's Protocol. Probing two disparate structs to do a single check is error-prone and may lead to these metadata/protocol checks to become out of sync. Moreover checks are performed in the CDF path, snapshot scan path, and in the read path. Thus there are many ways in which protocol and metadata checks can diverge with one another.Put simply, the problems are:
Solution
TableConfiguration
consolidates all protocol and metadata checks to one place. It also ensures that the logic for checking feature enablement is kept consistent throughout the codebase. This addresses problem 1-4 outlined above.Closes: #571
How was this change tested?
existing tests pass.
TODO: add more tests