-
Notifications
You must be signed in to change notification settings - Fork 139
feat: Clustering/ Partitioning/ Domain metadata support for create table #1675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
sanujbasu
wants to merge
3
commits into
delta-io:main
Choose a base branch
from
sanujbasu:stack/create_table_5
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Adds the ability to explicitly enable table features via table properties
using the delta.feature.<featureName> = supported syntax, matching the
Java Kernel's behavior. Only ALLOWED_DELTA_FEATURES can be set during
create. Features get added to protocol features.
2. Allows the min reader/ writer versions to be updated in the protocol using
signal flags. Only protocol versions (3, 7) are supported.
Key Changes:
- Add SET_TABLE_FEATURE_SUPPORTED_PREFIX and SET_TABLE_FEATURE_SUPPORTED_VALUE
constants to table_features module. Move the feature/ property allow/
deny list to the table property configuration module
- Add TableFeature::from_name() to parse feature names from strings
- Add TableFeature::is_reader_writer() to check feature type
- Add TableCreationConfig struct to encapsulate parsing and validation of
user-provided table properties during CREATE TABLE operations.
- Extract delta.feature.* signal flags into reader/writer feature lists
- Extract delta.minReaderVersion/minWriterVersion into protocol hints
- Strip signal flags from properties, pass remaining to metadata
- Reject unknown features and invalid feature flag values
Usage:
create_table("/path/to/table", schema, "MyApp/1.0")
.with_table_properties([
("delta.minReaderVersion", "3"),
("delta.minWriterVersion", "7"),
])
.build(&engine, Box::new(FileSystemCommitter::new()))?
.commit(&engine)?;
The delta.feature.* properties are consumed during build() and not stored
in the final Metadata configuration, matching Java Kernel behavior.
Adds support for user and system domain metadata (domains with 'delta.' prefix) during table creation. This enables features like clustering to be configured at table creation time. Invariants are validated and feature checks performed. Changes: - Refactors generate_domain_metadata_actions to consume domain metadata actions passed down by the create table builder. - Refactors validate_user_domain_operations() to validate_domain_metadata_operations() which enforces a myriad of domain metadata invariants. - Adds validate_system_domain_feature to make sure relevant features are supported when row tracking and clustering domain metadata are pushed down. Clustered table creation support will be added in a stacked PR. The PR with push down the domain metadata for clustered tables into the log. Integration tests to be added in kernel/tests/create_table.rs once clustered table creation support is implement since validation testing requires feature allow listing.
This commit adds the ability to create partitioned or clustered tables. Since clustering requires the ability to set the domain metadata, this changes also adds the ability the specify system and user domain metadata when we create a table. New modules: - kernel/src/transaction/data_layout.rs: DataLayout enum with None, Partitioned, and Clustered variants for type-safe mutual exclusion - kernel/src/clustering.rs: ClusteringMetadataDomain for delta.clustering domain metadata serialization CreateTableTransactionBuilder changes: - Replace partition_columns field with data_layout: DataLayout - Add with_data_layout() method for setting partition or clustering columns - Validate clustering columns exist in schema (supports nested paths) - Enforce max 4 clustering columns per Delta spec - Auto-add 'clustering' and 'domainMetadata' features for clustered tables - Generate delta.clustering domain metadata on commit Transaction changes: - Allow system domain metadata (delta.*) in create-table transactions - Extract create-table domain metadata handling to helper method Protocol changes: - Deduplicate features using IndexSet to prevent duplicates while preserving insertion order - Enable ClusteredTable feature (change kernel_support to Supported) Tests added for: - DataLayout enum variants and accessors - Clustered table creation with single and nested columns - Clustering column validation (non-existent, too many columns) - ClusteringMetadataDomain serialization/deserialization
This was referenced Jan 24, 2026
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1675 +/- ##
==========================================
+ Coverage 84.64% 84.78% +0.14%
==========================================
Files 125 128 +3
Lines 34721 35362 +641
Branches 34721 35362 +641
==========================================
+ Hits 29388 29981 +593
- Misses 3983 4016 +33
- Partials 1350 1365 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review incremental changes.
This commit adds the ability to create partitioned or clustered tables.
Since clustering requires the ability to set the domain metadata, this
changes also adds the ability the specify system and user domain metadata
when we create a table.
New modules:
Partitioned, and Clustered variants for type-safe mutual exclusion
domain metadata serialization
CreateTableTransactionBuilder changes:
Transaction changes:
Protocol changes:
preserving insertion order
Tests added for:
What changes are proposed in this pull request?
How was this change tested?