Skip to content

Conversation

@sanujbasu
Copy link
Collaborator

@sanujbasu sanujbasu commented Jan 24, 2026

🥞 Stacked PR

Use this link to review incremental changes.


This commit adds the ability to create partitioned or clustered tables.
Since clustering requires the ability to set the domain metadata, this
changes also adds the ability the specify system and user domain metadata
when we create a table.

New modules:

  • kernel/src/transaction/data_layout.rs: DataLayout enum with None,
    Partitioned, and Clustered variants for type-safe mutual exclusion
  • kernel/src/clustering.rs: ClusteringMetadataDomain for delta.clustering
    domain metadata serialization

CreateTableTransactionBuilder changes:

  • Replace partition_columns field with data_layout: DataLayout
  • Add with_data_layout() method for setting partition or clustering columns
  • Validate clustering columns exist in schema (supports nested paths)
  • Enforce max 4 clustering columns per Delta spec
  • Auto-add 'clustering' and 'domainMetadata' features for clustered tables
  • Generate delta.clustering domain metadata on commit

Transaction changes:

  • Allow system domain metadata (delta.*) in create-table transactions
  • Extract create-table domain metadata handling to helper method

Protocol changes:

  • Deduplicate features using IndexSet to prevent duplicates while
    preserving insertion order
  • Enable ClusteredTable feature (change kernel_support to Supported)

Tests added for:

  • DataLayout enum variants and accessors
  • Clustered table creation with single and nested columns
  • Clustering column validation (non-existent, too many columns)
  • ClusteringMetadataDomain serialization/deserialization

What changes are proposed in this pull request?

How was this change tested?

1. Adds the ability to explicitly enable table features via table properties
using the delta.feature.<featureName> = supported syntax, matching the
Java Kernel's behavior. Only ALLOWED_DELTA_FEATURES can be set during
create. Features get added to protocol features.

2. Allows the min reader/ writer versions to be updated in the protocol using
signal flags. Only protocol versions (3, 7) are supported.

Key Changes:
- Add SET_TABLE_FEATURE_SUPPORTED_PREFIX and SET_TABLE_FEATURE_SUPPORTED_VALUE
  constants to table_features module. Move the feature/ property allow/
  deny list to the table property configuration module
- Add TableFeature::from_name() to parse feature names from strings
- Add TableFeature::is_reader_writer() to check feature type
- Add TableCreationConfig struct to encapsulate parsing and validation of
user-provided table properties during CREATE TABLE operations.
- Extract delta.feature.* signal flags into reader/writer feature lists
- Extract delta.minReaderVersion/minWriterVersion into protocol hints
- Strip signal flags from properties, pass remaining to metadata
- Reject unknown features and invalid feature flag values

Usage:
create_table("/path/to/table", schema, "MyApp/1.0")
    .with_table_properties([
        ("delta.minReaderVersion", "3"),
        ("delta.minWriterVersion", "7"),
    ])
    .build(&engine, Box::new(FileSystemCommitter::new()))?
    .commit(&engine)?;

The delta.feature.* properties are consumed during build() and not stored
in the final Metadata configuration, matching Java Kernel behavior.
Adds support for user and system domain metadata (domains with 'delta.' prefix)
during table creation. This enables features like clustering to be configured at
table creation time. Invariants are validated and feature checks
performed.

Changes:
- Refactors generate_domain_metadata_actions to consume domain metadata
  actions passed down by the create table builder.
- Refactors validate_user_domain_operations() to
  validate_domain_metadata_operations() which enforces a myriad of
  domain metadata invariants.
- Adds validate_system_domain_feature to make sure relevant features
  are supported when row tracking and clustering domain metadata are
  pushed down.

Clustered table creation support will be added in a stacked PR. The
PR with push down the domain metadata for clustered tables into the log.

Integration tests to be added in  kernel/tests/create_table.rs once
clustered table creation support is implement since validation testing
requires feature allow listing.
This commit adds the ability to create partitioned or clustered tables.
Since clustering requires the ability to set the domain metadata, this
changes also adds the ability the specify system and user domain metadata
when we create a table.

New modules:
- kernel/src/transaction/data_layout.rs: DataLayout enum with None,
  Partitioned, and Clustered variants for type-safe mutual exclusion
- kernel/src/clustering.rs: ClusteringMetadataDomain for delta.clustering
  domain metadata serialization

CreateTableTransactionBuilder changes:
- Replace partition_columns field with data_layout: DataLayout
- Add with_data_layout() method for setting partition or clustering columns
- Validate clustering columns exist in schema (supports nested paths)
- Enforce max 4 clustering columns per Delta spec
- Auto-add 'clustering' and 'domainMetadata' features for clustered tables
- Generate delta.clustering domain metadata on commit

Transaction changes:
- Allow system domain metadata (delta.*) in create-table transactions
- Extract create-table domain metadata handling to helper method

Protocol changes:
- Deduplicate features using IndexSet to prevent duplicates while
  preserving insertion order
- Enable ClusteredTable feature (change kernel_support to Supported)

Tests added for:
- DataLayout enum variants and accessors
- Clustered table creation with single and nested columns
- Clustering column validation (non-existent, too many columns)
- ClusteringMetadataDomain serialization/deserialization
@codecov
Copy link

codecov bot commented Jan 24, 2026

Codecov Report

❌ Patch coverage is 92.49330% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.78%. Comparing base (a58cbb1) to head (8ba5965).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/transaction/mod.rs 59.42% 25 Missing and 3 partials ⚠️
kernel/src/transaction/create_table.rs 95.11% 5 Missing and 6 partials ⚠️
kernel/src/table_property_protocol_config.rs 96.26% 5 Missing and 4 partials ⚠️
kernel/src/clustering.rs 97.01% 0 Missing and 2 partials ⚠️
kernel/src/row_tracking.rs 77.77% 1 Missing and 1 partial ⚠️
kernel/src/transaction/data_layout.rs 97.64% 0 Missing and 2 partials ⚠️
kernel/src/table_features/mod.rs 96.66% 0 Missing and 1 partial ⚠️
kernel/src/utils.rs 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1675      +/-   ##
==========================================
+ Coverage   84.64%   84.78%   +0.14%     
==========================================
  Files         125      128       +3     
  Lines       34721    35362     +641     
  Branches    34721    35362     +641     
==========================================
+ Hits        29388    29981     +593     
- Misses       3983     4016      +33     
- Partials     1350     1365      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant