Skip to content

feat: Add high level api for timestamp conversion #900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

OussamaSaoudi
Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi commented Apr 28, 2025

What changes are proposed in this pull request?

This is a stacked PR. Please look at the LAST commit of each PR for the files changed.

This PR introduces high level apis to perform timestamp to version conversion such as latest_version_as_of, first_version_after, and timestamp_range_to_versions.

How was this change tested?

The range api is tested on a table generated by Delta Kernel Java. Several timestamp to version conversion queries are performed to verify that range timestamp queries work.

Copy link

codecov bot commented Apr 28, 2025

Codecov Report

Attention: Patch coverage is 83.03887% with 96 lines in your changes missing coverage. Please review.

Project coverage is 84.66%. Comparing base (60d0944) to head (13f0369).

Files with missing lines Patch % Lines
kernel/src/history_manager/mod.rs 86.04% 13 Missing and 54 partials ⚠️
kernel/src/history_manager/error.rs 55.55% 15 Missing and 1 partial ⚠️
kernel/src/log_segment.rs 84.21% 4 Missing and 2 partials ⚠️
kernel/src/log_segment/tests.rs 45.45% 6 Missing ⚠️
ffi/src/error.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #900      +/-   ##
==========================================
- Coverage   84.68%   84.66%   -0.02%     
==========================================
  Files          92       94       +2     
  Lines       23015    23581     +566     
  Branches    23015    23581     +566     
==========================================
+ Hits        19491    19966     +475     
- Misses       2563     2598      +35     
- Partials      961     1017      +56     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@OussamaSaoudi OussamaSaoudi force-pushed the history_5_history_high_level_and_integration branch 2 times, most recently from 896fa67 to c4e66d5 Compare April 28, 2025 07:20
@OussamaSaoudi OussamaSaoudi marked this pull request as ready for review April 28, 2025 07:21
@OussamaSaoudi OussamaSaoudi force-pushed the history_5_history_high_level_and_integration branch 3 times, most recently from 2ffb953 to 6327a89 Compare April 30, 2025 05:31
@OussamaSaoudi OussamaSaoudi force-pushed the history_5_history_high_level_and_integration branch 3 times, most recently from aaf4cf7 to ecc22d7 Compare May 12, 2025 17:06
log_segment: LogSegment,
snapshot: Arc<Snapshot>,
commit_to_timestamp_cache: RefCell<HashMap<Url, Timestamp>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does a cache belong in kernel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wondered about that as well. I decided to go with caching because the cache is c*O(size of log segment). And the constant c is rly small.

/// - `limit`: Optional maximum number of versions to track. When specified, the earliest
/// queryable version will be `snapshot.version() - limit`. This allows trading
/// memory usage for historical reach.
pub fn history_manager_from_snapshot(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering we have two constructors here already and we have optional info in each of them, would it make more sense to just have a builder?

let hm = HistoryManagerBuilder::new(engine)
    .withSnapshot(snapshot)
    .withLimit(1_000)
    .build()

aside: I wonder if it's more idiomatic to have engine in new or build..?

and should we include end version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all functions are static, so we avoid this :)

@OussamaSaoudi OussamaSaoudi force-pushed the history_5_history_high_level_and_integration branch 2 times, most recently from 758166e to b8a55ae Compare May 26, 2025 05:59
@OussamaSaoudi OussamaSaoudi force-pushed the history_5_history_high_level_and_integration branch 3 times, most recently from 90933a6 to b32b5ec Compare June 2, 2025 07:20
@OussamaSaoudi OussamaSaoudi changed the title feat: Make LogHistoryManager public and provide high level apis feat: Add high level for timestamp conversion Jun 3, 2025
@OussamaSaoudi OussamaSaoudi force-pushed the history_5_history_high_level_and_integration branch from b32b5ec to dc718fe Compare June 3, 2025 00:07
@OussamaSaoudi OussamaSaoudi removed the breaking-change Change that require a major version bump label Jun 3, 2025
@OussamaSaoudi OussamaSaoudi force-pushed the history_5_history_high_level_and_integration branch from dc718fe to 225da5f Compare June 3, 2025 00:10
@OussamaSaoudi OussamaSaoudi force-pushed the history_5_history_high_level_and_integration branch from 225da5f to 13f0369 Compare June 3, 2025 00:27
@OussamaSaoudi OussamaSaoudi changed the title feat: Add high level for timestamp conversion feat: Add high level api for timestamp conversion Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants