Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add the in-commit timestamp field to CommitInfo #581

Merged
merged 13 commits into from
Jan 15, 2025
6 changes: 5 additions & 1 deletion kernel/src/actions/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -331,8 +331,11 @@ where
struct CommitInfo {
/// The time this logical file was created, as milliseconds since the epoch.
/// Read: optional, write: required (that is, kernel always writes).
/// If in-commit timestamps are enabled, this is always required.
pub(crate) timestamp: Option<i64>,
/// The time this logical file was created, as milliseconds since the epoch. Unlike
/// `timestamp`, this field is guaranteed to be monotonically increase with each commit.
/// If in-commit timestamps are enabled, this is always required.
pub(crate) in_commit_timestamp: Option<i64>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also note that (when enabled) it requires that commitInfo be the first action in the commit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While that's technically what the protocol says, delta-spark doesn't check that this is true. It simply extracts the CommitInfo if present.

I think the error handling would be more code and let us capture fewer tables than delta spark, and commitInfo being the first action doesn't seem to impact us much in this scenario.

Copy link
Collaborator

@scovich scovich Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta-spark doesn't check, but it also does stop looking after the first line of json. So if the commit info is not first it will appear to not exist at all. Sometimes that causes an error downstream, and sometimes it would cause a silent wrong behavior.

I would prefer to check strictly unless/until we see a need to do otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich Do you recommend that we error, or just ignore the ICT

/// An arbitrary string that identifies the operation associated with this commit. This is
/// specified by the engine. Read: optional, write: required (that is, kernel alwarys writes).
pub(crate) operation: Option<String>,
Expand Down Expand Up @@ -694,6 +697,7 @@ mod tests {
"commitInfo",
StructType::new(vec![
StructField::new("timestamp", DataType::LONG, true),
StructField::new("inCommitTimestamp", DataType::LONG, true),
StructField::new("operation", DataType::STRING, true),
StructField::new(
"operationParameters",
Expand Down
6 changes: 6 additions & 0 deletions kernel/src/transaction.rs
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,12 @@ fn generate_commit_info(
.get_mut("operationParameters")
.ok_or_else(|| Error::missing_column("operationParameters"))?
.data_type = hack_data_type;

// Since writing in-commit timestamps is not supported, we remove the field so it is not
// written to the log
commit_info_data_type
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachschuermann just flagging this comment and approach to writing ICT.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine :) can we add an issue link/todo so we are tracking this?

.fields
.shift_remove("inCommitTimestamp");
commit_info_field.data_type = DataType::Struct(commit_info_data_type);

let commit_info_evaluator = engine.get_expression_handler().get_evaluator(
Expand Down
Loading