-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up and expand log path parsing utilities #347
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #347 +/- ##
==========================================
+ Coverage 74.03% 74.73% +0.70%
==========================================
Files 43 43
Lines 8137 8368 +231
Branches 8137 8368 +231
==========================================
+ Hits 6024 6254 +230
+ Misses 1733 1727 -6
- Partials 380 387 +7 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice thanks! just a couple small nits
kernel/src/path.rs
Outdated
get_filename(path) | ||
.and_then(|f| f.split_once('.')) | ||
.and_then(|(name, _)| get_version_opt(Some(name), VERSION_LEN)) | ||
impl CanTryIntoLogPath for FileMeta { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think CanTryIntoLogPath
should be called AsUrl
or AsLogPathUrl
. It's not 'trying' (the method should return a result/option if it was), and it's converting to a Url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a marker trait for ParsedLogPath::try_from
. I was hoping it would let me work around foreign trait restrictions, but it didn't. I changed it to AsUrl
tho, simpler name is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I also considered just using AsRef<Url>
, but AsRef
as a general mechanism for exposing struct fields seems weird and error prone, so I decided against it)
493ffbf
to
1453f9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very awesome, love the new ergonomics! just a quick question and comment
kernel/src/path.rs
Outdated
let file_type = match split.len() { | ||
// Commit file: <n>.json | ||
1 if extension == "json" => LogPathFileType::Commit, | ||
|
||
// Single-part checkpoint: <n>.checkpoint.parquet | ||
2 if split[0] == "checkpoint" && extension == "parquet" => { | ||
LogPathFileType::SinglePartCheckpoint | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could also do split.as_slice()
? (just for another idea if that helps readability)
let file_type = match split.len() { | |
// Commit file: <n>.json | |
1 if extension == "json" => LogPathFileType::Commit, | |
// Single-part checkpoint: <n>.checkpoint.parquet | |
2 if split[0] == "checkpoint" && extension == "parquet" => { | |
LogPathFileType::SinglePartCheckpoint | |
} | |
let file_type = match split.as_slice() { | |
// Commit file: <n>.json | |
[_] if extension == "json" => LogPathFileType::Commit, | |
// Single-part checkpoint: <n>.checkpoint.parquet | |
["checkpoint", _] && extension == "parquet" => { | |
LogPathFileType::SinglePartCheckpoint | |
} | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe doing pop_back for the extension would help with clarity so it isn't still in split
? (would require VecDeque though..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, that's nice!
…354) If we have a `_last_checkpoint` that is out of date, things can get confused. This code: 1. Cleans up the listing function a bit 2. Ensures we end up with the real latest checkpoint 3. Drops any commit files from the listing that are older than the last checkpoint 4. `warns!` if ` _last_checkpoint` is out of date 5. Adds a test for this case This code will conflict with #347, so maybe hold of merging until that merges and then I can rebase and clean this up more. --------- Co-authored-by: Nick Lanham <[email protected]> Co-authored-by: Zach Schuermann <[email protected]> Co-authored-by: Ryan Johnson <[email protected]> Co-authored-by: Stephen Carman <[email protected]>
…elta-io#354) If we have a `_last_checkpoint` that is out of date, things can get confused. This code: 1. Cleans up the listing function a bit 2. Ensures we end up with the real latest checkpoint 3. Drops any commit files from the listing that are older than the last checkpoint 4. `warns!` if ` _last_checkpoint` is out of date 5. Adds a test for this case This code will conflict with delta-io#347, so maybe hold of merging until that merges and then I can rebase and clean this up more. --------- Co-authored-by: Nick Lanham <[email protected]> Co-authored-by: Zach Schuermann <[email protected]> Co-authored-by: Ryan Johnson <[email protected]> Co-authored-by: Stephen Carman <[email protected]>
Today's log path parsing utilities have two weaknesses:
Option
because of the possibility that some path failed to parse. This complicates code that uses it.To solve both problems, rewrite the parsing utilities to take ownership of -- and embed -- the
FileMeta
they are derived from, so they can be instantiated once, passed around as needed, and the metadata extracted easily at the end. Also amp up the unit tests to cover the functionality more completely.