Skip to content

Fix failures when opening arrays or groups with directory placeholder files. #5558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

teo-tsirpanis
Copy link
Member

@teo-tsirpanis teo-tsirpanis commented Jun 25, 2025

Applies #4102 to groups.


TYPE: BUG
DESC: Fixed opening groups on object storage, that contain directory placeholder files.

@teo-tsirpanis teo-tsirpanis changed the title Teo/core 266 failure to open array from s3 with zero length objects at Fixed failures when directory placeholder files exist in object storage. Jun 25, 2025
@teo-tsirpanis teo-tsirpanis force-pushed the teo/core-266-failure-to-open-array-from-s3-with-zero-length-objects-at branch from 50060c7 to bda1060 Compare June 25, 2025 21:40
@teo-tsirpanis teo-tsirpanis changed the title Fixed failures when directory placeholder files exist in object storage. Fix failures when directory placeholder files exist in object storage. Jun 25, 2025
@teo-tsirpanis teo-tsirpanis changed the title Fix failures when directory placeholder files exist in object storage. Fix failures when opening arrays or groups with directory placeholder files. Jun 25, 2025
@teo-tsirpanis teo-tsirpanis force-pushed the teo/core-266-failure-to-open-array-from-s3-with-zero-length-objects-at branch from 5da8d87 to 0029231 Compare June 26, 2025 20:02
@teo-tsirpanis teo-tsirpanis force-pushed the teo/core-266-failure-to-open-array-from-s3-with-zero-length-objects-at branch from 0029231 to d39b664 Compare June 26, 2025 20:05
@teo-tsirpanis teo-tsirpanis force-pushed the teo/core-266-failure-to-open-array-from-s3-with-zero-length-objects-at branch from 703acac to 2e28d37 Compare June 27, 2025 21:34
@teo-tsirpanis teo-tsirpanis force-pushed the teo/core-266-failure-to-open-array-from-s3-with-zero-length-objects-at branch from 2e28d37 to decb024 Compare June 27, 2025 22:29
@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review June 28, 2025 19:19
@teo-tsirpanis teo-tsirpanis requested a review from ypatia June 28, 2025 19:19
Copy link
Contributor

@bekadavis9 bekadavis9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -75,7 +76,8 @@ struct GroupCPPFx {

GroupCPPFx::GroupCPPFx()
: ctx_c_(vfs_test_setup_.ctx_c)
, ctx_(vfs_test_setup_.ctx()) {
, ctx_(vfs_test_setup_.ctx())
, vfs_(ctx_, vfs_test_setup_.vfs_c, false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the granularity of this? Since you only need this for the new test (I presume) maybe this could be done in a subclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make the C++ VFS local to the test, but I don't see any problem with having the fixture class hold it.

}

// Filter out empty files of the same name as the directory
if (entry_uri.remove_trailing_slash() == uri.remove_trailing_slash() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uri.remove_trailing_slash() makes a copy of the underlying string. Not an expensive operation, but also one that we would like to avoid repeating for potentially every member of a directory. Best to move it out of the loop.

Though, this is not an ls_recursive, so in the typical case, we only expect to have __group, __meta, and __tiledb_group.tdb as directory entries, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uri.remove_trailing_slash() makes a copy of the underlying string. Not an expensive operation, but also one that we would like to avoid repeating for potentially every member of a directory. Best to move it out of the loop.

While this would be a good thing to do, it would probably be better still to avoid copying entirely by optimistically checking whether the paths match up to some reasonably-sized substring (perhaps all but a possible trailing slash), and only doing this remove_trailing_slash approach if so.

std::vector<URI> ls(const VFS& vfs, const URI& uri) {
auto dir_entries = vfs.ls_with_sizes(uri);
auto& dirs = dir_names();
std::vector<URI> uris;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably reserve something similar to dir_entries.size().
Is the case where we filter objects out a typical one? I'm supposing not since the previous implementation just returned everything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (iter == dirs.end() || entry.file_size() > 0) {
uris.emplace_back(entry_uri);
} else {
// Handle MinIO-based s3 implementation limitation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more detailed. It's not clear to me how the behavior difference noted in the issue comments would lead here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments, including link to minio issue.

} else {
// Handle MinIO-based s3 implementation limitation
throw GroupDirectoryException(
"Cannot list given uri; File '" + entry_uri.to_string() +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the recovery? If the user removes the empty object, then that would fix it, right?
If the error says that then it the user probably has enough information to resolve this themselves, potentially avoiding a support case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded message.

@teo-tsirpanis teo-tsirpanis requested a review from ypatia July 10, 2025 14:51
@teo-tsirpanis
Copy link
Member Author

Validated again with the following program:

#include <tiledb>

int main() {
  Context ctx;

  Group g(
      ctx,
      "s3://tiledb-theodore/ephemeral/core266/"
      "072333E9-3D1E-4AFC-843B-B92D72CFB614",
      TILEDB_READ);

  std::cout << "Group metadata count: " << g.metadata_num() << std::endl;

  return 0;
}

@teo-tsirpanis teo-tsirpanis merged commit 1288f9c into main Jul 11, 2025
56 checks passed
@teo-tsirpanis teo-tsirpanis deleted the teo/core-266-failure-to-open-array-from-s3-with-zero-length-objects-at branch July 11, 2025 16:43
@teo-tsirpanis
Copy link
Member Author

/backport to release-2.28

Copy link
Contributor

Started backporting to release-2.28: https://github.com/TileDB-Inc/TileDB/actions/runs/16225162743

ypatia pushed a commit that referenced this pull request Jul 18, 2025
…th directory placeholder files. (#5558) (#5583)

Backport of #5558 to release-2.28

---
TYPE: BUG
DESC: Fixed opening groups on object storage, that contain directory
placeholder files.

---------

Co-authored-by: Theodore Tsirpanis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants