-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: get prefix from offset path #699
base: main
Are you sure you want to change the base?
Conversation
5559d58
to
46ca271
Compare
Signed-off-by: Robert Pack <[email protected]>
46ca271
to
a3a7671
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #699 +/- ##
==========================================
+ Coverage 84.22% 84.23% +0.01%
==========================================
Files 77 78 +1
Lines 17926 17988 +62
Branches 17926 17988 +62
==========================================
+ Hits 15098 15153 +55
- Misses 2110 2114 +4
- Partials 718 721 +3 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: Robert Pack <[email protected]>
/// List the paths in the same directory that are lexicographically greater than | ||
/// (UTF-8 sorting) the given `path`. The result should also be sorted by the file name. | ||
/// | ||
/// If the path is directory-like (ends with '/'), the result should contain | ||
/// all the files in the directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure this is the behavior we want to go for, but thought I"d put up the PR for discussion.
Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: Robert Pack <[email protected]>
let offset = Path::from_url_path(path.path())?; | ||
let prefix = if url.path().ends_with('/') { | ||
offset.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a bit of trouble following this, but I think it's doing a directory listing rather than a traditional lexicographical start-after listing? That doesn't seem correct given the documented behavior of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: The offset
is used for list-after; the prefix
is used to restrict the listing to a specific directory. And Path
provides no easy way to check whether a name is directory-like, because it strips trailing /
, so we're reduced to this manual manipulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some code comments explaining all this might be helpful
let parts = offset.parts().collect_vec(); | ||
if parts.is_empty() { | ||
return Err(Error::generic(format!( | ||
"Offset path must not be a root directory. Got: '{}'", | ||
url.as_str() | ||
))); | ||
} | ||
Path::from_iter(parts[..parts.len() - 1].iter().cloned()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just
let parts = offset.parts().collect_vec(); | |
if parts.is_empty() { | |
return Err(Error::generic(format!( | |
"Offset path must not be a root directory. Got: '{}'", | |
url.as_str() | |
))); | |
} | |
Path::from_iter(parts[..parts.len() - 1].iter().cloned()) | |
let parts = offset.parts().collect_vec(); | |
if parts.pop().is_empty() { | |
return Err(Error::generic(format!( | |
"Offset path must not be a root directory. Got: '{}'", | |
url.as_str() | |
))); | |
} | |
Path::from_iter(parts) |
@@ -48,9 +45,19 @@ impl<E: TaskExecutor> FileSystemClient for ObjectStoreFileSystemClient<E> { | |||
path: &Url, | |||
) -> DeltaResult<Box<dyn Iterator<Item = DeltaResult<FileMeta>>>> { | |||
let url = path.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: I don't understand why we need this extra copy when path
is never consumed? it makes the code harder to understand because the reader has to keep track of two of the same thing.
What changes are proposed in this pull request?
Our
list_from
implementation for the object_store based filesystem client is currently broken, since it does not behave as documented / required for that function. Specifically we should list all files in the parent folder for using the path as offset to list from.In a follow up PR we then need to lift teh assumtion that all URLs will always be under the same store to get proper URL handling.
This PR affects the following public APIs
DefaultEngine::new
no longer requires atable_root
parameter. I do expect some more changes in an immediate follow-up PR where we update object store handling to account for files tored in separate stores.How was this change tested?
Current unit tests.