-
Notifications
You must be signed in to change notification settings - Fork 196
Fix failures when opening arrays or groups with directory placeholder files. #5558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix failures when opening arrays or groups with directory placeholder files. #5558
Conversation
50060c7
to
bda1060
Compare
5da8d87
to
0029231
Compare
0029231
to
d39b664
Compare
703acac
to
2e28d37
Compare
2e28d37
to
decb024
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -75,7 +76,8 @@ struct GroupCPPFx { | |||
|
|||
GroupCPPFx::GroupCPPFx() | |||
: ctx_c_(vfs_test_setup_.ctx_c) | |||
, ctx_(vfs_test_setup_.ctx()) { | |||
, ctx_(vfs_test_setup_.ctx()) | |||
, vfs_(ctx_, vfs_test_setup_.vfs_c, false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the granularity of this? Since you only need this for the new test (I presume) maybe this could be done in a subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make the C++ VFS local to the test, but I don't see any problem with having the fixture class hold it.
tiledb/sm/group/group_directory.cc
Outdated
} | ||
|
||
// Filter out empty files of the same name as the directory | ||
if (entry_uri.remove_trailing_slash() == uri.remove_trailing_slash() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uri.remove_trailing_slash()
makes a copy of the underlying string. Not an expensive operation, but also one that we would like to avoid repeating for potentially every member of a directory. Best to move it out of the loop.
Though, this is not an ls_recursive
, so in the typical case, we only expect to have __group
, __meta
, and __tiledb_group.tdb
as directory entries, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uri.remove_trailing_slash()
makes a copy of the underlying string. Not an expensive operation, but also one that we would like to avoid repeating for potentially every member of a directory. Best to move it out of the loop.
While this would be a good thing to do, it would probably be better still to avoid copying entirely by optimistically checking whether the paths match up to some reasonably-sized substring (perhaps all but a possible trailing slash), and only doing this remove_trailing_slash
approach if so.
std::vector<URI> ls(const VFS& vfs, const URI& uri) { | ||
auto dir_entries = vfs.ls_with_sizes(uri); | ||
auto& dirs = dir_names(); | ||
std::vector<URI> uris; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably reserve something similar to dir_entries.size()
.
Is the case where we filter objects out a typical one? I'm supposing not since the previous implementation just returned everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tiledb/sm/group/group_directory.cc
Outdated
if (iter == dirs.end() || entry.file_size() > 0) { | ||
uris.emplace_back(entry_uri); | ||
} else { | ||
// Handle MinIO-based s3 implementation limitation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be more detailed. It's not clear to me how the behavior difference noted in the issue comments would lead here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more comments, including link to minio issue.
} else { | ||
// Handle MinIO-based s3 implementation limitation | ||
throw GroupDirectoryException( | ||
"Cannot list given uri; File '" + entry_uri.to_string() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the recovery? If the user removes the empty object, then that would fix it, right?
If the error says that then it the user probably has enough information to resolve this themselves, potentially avoiding a support case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded message.
…ith-zero-length-objects-at
…ith-zero-length-objects-at
Validated again with the following program: #include <tiledb>
int main() {
Context ctx;
Group g(
ctx,
"s3://tiledb-theodore/ephemeral/core266/"
"072333E9-3D1E-4AFC-843B-B92D72CFB614",
TILEDB_READ);
std::cout << "Group metadata count: " << g.metadata_num() << std::endl;
return 0;
} |
/backport to release-2.28 |
Started backporting to release-2.28: https://github.com/TileDB-Inc/TileDB/actions/runs/16225162743 |
…th directory placeholder files. (#5558) (#5583) Backport of #5558 to release-2.28 --- TYPE: BUG DESC: Fixed opening groups on object storage, that contain directory placeholder files. --------- Co-authored-by: Theodore Tsirpanis <[email protected]>
Applies #4102 to groups.
TYPE: BUG
DESC: Fixed opening groups on object storage, that contain directory placeholder files.