-
Notifications
You must be signed in to change notification settings - Fork 139
feat: Introduce TableCreationConfig for table creation property handling #1655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1655 +/- ##
==========================================
+ Coverage 84.64% 84.75% +0.11%
==========================================
Files 125 126 +1
Lines 34721 35005 +284
Branches 34721 35005 +284
==========================================
+ Hits 29388 29668 +280
- Misses 3983 3986 +3
- Partials 1350 1351 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| let table_path = temp_dir.path().to_str().expect("Invalid path").to_string(); | ||
|
|
||
| let engine = | ||
| create_default_engine(&url::Url::from_directory_path(&table_path).expect("Invalid URL")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use ? to remove the .expect() calls and make this test more succinct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to weigh the tradeoffs here. "?" is useful when we are testing error propagation but strips some context
like the line number.
I personally prefer expect() and am kinda surprised that we don't do it in every test. I prefer test failures to be loud with all the debug context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fair. Perhaps a more experienced rust dev like @nicklan or @OussamaSaoudi could chime in here as a tie breaker?
All I want is: we kernel rust developers agree on the one best practice, and stick to it
9bb9ade to
e93f560
Compare
e93f560 to
1cbde24
Compare
1cbde24 to
352203b
Compare
352203b to
484a0f2
Compare
1. Adds the ability to explicitly enable table features via table properties
using the delta.feature.<featureName> = supported syntax, matching the
Java Kernel's behavior. Only ALLOWED_DELTA_FEATURES can be set during
create. Features get added to protocol features.
2. Allows the min reader/ writer versions to be updated in the protocol using
signal flags. Only protocol versions (3, 7) are supported.
Key Changes:
- Add SET_TABLE_FEATURE_SUPPORTED_PREFIX and SET_TABLE_FEATURE_SUPPORTED_VALUE
constants to table_features module. Move the feature/ property allow/
deny list to the table property configuration module
- Add TableFeature::from_name() to parse feature names from strings
- Add TableFeature::is_reader_writer() to check feature type
- Add TableCreationConfig struct to encapsulate parsing and validation of
user-provided table properties during CREATE TABLE operations.
- Extract delta.feature.* signal flags into reader/writer feature lists
- Extract delta.minReaderVersion/minWriterVersion into protocol hints
- Strip signal flags from properties, pass remaining to metadata
- Reject unknown features and invalid feature flag values
Usage:
create_table("/path/to/table", schema, "MyApp/1.0")
.with_table_properties([
("delta.minReaderVersion", "3"),
("delta.minWriterVersion", "7"),
])
.build(&engine, Box::new(FileSystemCommitter::new()))?
.commit(&engine)?;
The delta.feature.* properties are consumed during build() and not stored
in the final Metadata configuration, matching Java Kernel behavior.
484a0f2 to
d09e389
Compare
| ensure_table_does_not_exist(storage.as_ref(), &delta_log_url, &self.path)?; | ||
| // Parse and validate table properties | ||
| let config = TablePropertyProtocolConfig::try_from(self.table_properties)?; | ||
| config.validate_for_create()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is the right abstraction for property/feature management. I think the right approach is something that takes:
- initial metadata (if present)
- initial protocol (if present)
- Table property updates
- dataLayout (whenever we ship it)
And produces
- metadata
- protocol
Given new user defined properties , we should:
- enable table features. These features may enable other features through auto-enablement
- add new properties to the table (ex: inCommitTimestampEnablementVersion and inCommitTimestampEnablementTimstamp)
- enforce requirements of properties/features. (Ex: minReaderVersion=2 will enforce that any updates we performed above do not upgrade this beyond version 2).
- Update the schema according to column mapping mode and IcebergWriterCompatV1.
We take an initial P&M because this should also be shared with alter table/updates.
@scottsand-db would love input on this since I reckon you're familiar with this part of the code.
| let mut user_reader_version: Option<i32> = None; | ||
| let mut user_writer_version: Option<i32> = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we factor out reader/writer version extraction? We have a hashmap that we can directly query. The code blocks below are also mostly duplicated
} else if key == MIN_WRITER_VERSION_PROP {
// Parse delta.minWriterVersion
let version: i32 = value.parse().map_err(|_| {
Error::generic(format!(
"Invalid value '{}' for '{}'. Must be an integer.",
value, key
))
})?;
// Validate immediately: if specified, must be the required version
if version != TABLE_FEATURES_MIN_WRITER_VERSION {
return Err(Error::generic(format!(
"Invalid value '{}' for '{}'. Only '{}' is supported.",
version, MIN_WRITER_VERSION_PROP, TABLE_FEATURES_MIN_WRITER_VERSION
)));
}
user_writer_version = Some(version);| } else { | ||
| // Pass through to table properties | ||
| table_properties.insert(key, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still pass through any old delta.* feature. We only filter for SET_TABLE_FEATURE_SUPPORTED_PREFIX, and min reader/writer features. Example :
delta.enableDeletionVectorswould be added to table_properties. We should probably very closely track that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose that we make three sets: delta.supported.* props, all other delta.* props, and everything else.
for the delta.* props, we should probably make that an allowlist as well. So we'll have two allowlists: one for features, and another for other delta table properties.
| let protocol = Protocol::try_new( | ||
| user_reader_version.unwrap_or(TABLE_FEATURES_MIN_READER_VERSION), | ||
| user_writer_version.unwrap_or(TABLE_FEATURES_MIN_WRITER_VERSION), | ||
| Some(reader_features.iter()), | ||
| Some(all_features.iter()), | ||
| )?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ensure that for a generated/constructed protocol+metadata that they are valid. We currently do this in the TableConfiguration try_new constructor. We shouldn't be writing out protocol and metadata without doing this validation. This is another reason why generating both metadata and protocol here is a good idea :)
Below is a snippet of the checks we do in TableConfiguration.
let schema = Arc::new(metadata.parse_schema()?);
let table_properties = metadata.parse_table_properties();
let column_mapping_mode = column_mapping_mode(&protocol, &table_properties);
// validate column mapping mode -- all schema fields should be correctly (un)annotated
validate_schema_column_mapping(&schema, column_mapping_mode)?;
validate_timestamp_ntz_feature_support(&schema, &protocol)?;
validate_variant_type_feature_support(&schema, &protocol)?;
🥞 Stacked PR
Use this link to review incremental changes.
What changes are proposed in this pull request?
Adds the ability to explicitly enable table features via table properties
using the delta.feature. = supported syntax, matching the
Java Kernel's behavior. Only ALLOWED_DELTA_FEATURES can be set during
create. Features get added to protocol features.
Allows the min reader/ writer versions to be updated in the protocol using
signal flags. Only protocol versions (3, 7) are supported.
Key Changes:
constants to table_features module. ALLOWED_DELTA_PROPERTIES moved to the table property
configuration module.
user-provided table properties during CREATE TABLE operations.
The delta.feature.* properties are consumed during build() and not stored
in the final Metadata configuration, matching Java Kernel behavior.
How was this change tested?
Rust module unit tests