Skip to content

Class VFS inherits FilesystemBase. #5584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 15, 2025
Merged

Class VFS inherits FilesystemBase. #5584

merged 2 commits into from
Jul 15, 2025

Conversation

bekadavis9
Copy link
Contributor

Class VFS inherits base class FilesystemBase.

Resolves CORE-294.


TYPE: IMPROVEMENT
DESC: Class VFS inherits base class FilesystemBase.

@bekadavis9 bekadavis9 requested a review from teo-tsirpanis July 15, 2025 14:33
@bekadavis9 bekadavis9 force-pushed the rd/vfs_refactor-vfs branch from 4203cf0 to 27c1cd0 Compare July 15, 2025 16:35
@bekadavis9 bekadavis9 force-pushed the rd/vfs_refactor-vfs branch from 27c1cd0 to 8429142 Compare July 15, 2025 17:13
Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for un-statusing VFS!

Left some small suggestions, as well as some larger-scoped items to consider in the future.

Comment on lines +326 to 329
vfs.remove_dir(uris[i]);
if (vfs.is_file(commit_uris_to_delete[i])) {
vfs.remove_file(commit_uris_to_delete[i]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the commit should be deleted first. remove_dir is not atomic, and if it fails in the middle of deleting the fragment directory, the array will be left in a corrupted state.

Comment on lines 218 to +253
// Create array directory
throw_if_not_ok(resources.vfs().create_dir(array_uri));
resources.vfs().create_dir(array_uri);

// Create array schema directory
URI array_schema_dir_uri =
array_uri.join_path(constants::array_schema_dir_name);
throw_if_not_ok(resources.vfs().create_dir(array_schema_dir_uri));
resources.vfs().create_dir(array_schema_dir_uri);

// Create the enumerations directory inside the array schema directory
URI array_enumerations_uri =
array_schema_dir_uri.join_path(constants::array_enumerations_dir_name);
throw_if_not_ok(resources.vfs().create_dir(array_enumerations_uri));
resources.vfs().create_dir(array_enumerations_uri);

// Create commit directory
URI array_commit_uri = array_uri.join_path(constants::array_commits_dir_name);
throw_if_not_ok(resources.vfs().create_dir(array_commit_uri));
resources.vfs().create_dir(array_commit_uri);

// Create fragments directory
URI array_fragments_uri =
array_uri.join_path(constants::array_fragments_dir_name);
throw_if_not_ok(resources.vfs().create_dir(array_fragments_uri));
resources.vfs().create_dir(array_fragments_uri);

// Create array metadata directory
URI array_metadata_uri =
array_uri.join_path(constants::array_metadata_dir_name);
throw_if_not_ok(resources.vfs().create_dir(array_metadata_uri));
resources.vfs().create_dir(array_metadata_uri);

// Create fragment metadata directory
URI array_fragment_metadata_uri =
array_uri.join_path(constants::array_fragment_meta_dir_name);
throw_if_not_ok(resources.vfs().create_dir(array_fragment_metadata_uri));
resources.vfs().create_dir(array_fragment_metadata_uri);

// Create dimension label directory
URI array_dimension_labels_uri =
array_uri.join_path(constants::array_dimension_labels_dir_name);
throw_if_not_ok(resources.vfs().create_dir(array_dimension_labels_uri));
resources.vfs().create_dir(array_dimension_labels_uri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking that we should remove this. It's always been a no-op on object storage and on file storage it's not necessary since #5480 and just creates potentially unnecessary directories. Same with all calls to create_dir in library code.

Or not, memfs does not automatically create parent directories.

Comment on lines +531 to 536
if (vfs.is_file(commit_uri)) {
vfs.remove_file(commit_uri);
}

bool is_dir = false;
throw_if_not_ok(vfs.is_dir(fragment_uris_to_vacuum[i], &is_dir));
if (is_dir) {
throw_if_not_ok(vfs.remove_dir(fragment_uris_to_vacuum[i]));
if (vfs.is_dir(fragment_uris_to_vacuum[i])) {
vfs.remove_dir(fragment_uris_to_vacuum[i]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several instances of "if is file/dir, then remove file/dir" that add unnecessary I/O and latency. We should just do the removal and ignore errors if the file/directory did not exist1. .NET's File.Remove for example does not throw and returns a bool to communicate whether the file existed.

Will open a story about this.

Footnotes

  1. This is what will actually happen with remove_dir on object storage; it will make a list and remove everything, and if there's nothing to remove it won't fail. We should make sure the behavior in all backends is consistent.

Comment on lines +72 to +74
if (!vfs_->is_file(uri_)) {
vfs_->touch(uri_);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!vfs_->is_file(uri_)) {
vfs_->touch(uri_);
}
vfs_->touch(uri_);

VFS::touch itself makes the check for the file's existence. For the clouds where we do conditional requests, there might be a concern that we are always going to do a more expensive write operation, but I believe the reduced roundtrips are worth it, and this code path runs only in the user-facing VFS C API.

Comment on lines 810 to +813
// If new_uri exists, delete it or raise an error based on `force`
bool is_file;
RETURN_NOT_OK(this->is_file(new_uri, &is_file));
if (is_file)
RETURN_NOT_OK(remove_file(new_uri));
if (this->is_file(new_uri)) {
remove_file(new_uri);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to Posix? IIRC the object storage backends will overwrite the file themselves, and on Windows the destination file will be removed only for Win::copy_file to throw because it is not implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address in followup.

Comment on lines 684 to +687
// If new_uri exists, delete it or raise an error based on `force`
bool is_file;
RETURN_NOT_OK(this->is_file(new_uri, &is_file));
if (is_file)
RETURN_NOT_OK(remove_file(new_uri));
if (this->is_file(new_uri)) {
remove_file(new_uri);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with copy_file. BTW we should strive towards removing as much logic from the VFS class as possible and pushing it down to the backends, or shared helpers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address in followup.

Comment on lines +451 to +452
CHECK(!vfs.is_file(largefile));
CHECK(!vfs.is_file(smallfile));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch2 discourages writing CHECK(!foo), providing CHECK_FALSE(foo) in its place.

There are too many instances in our tests that you don't have to do anything now; will open a story later.

@bekadavis9 bekadavis9 merged commit 639322f into main Jul 15, 2025
56 checks passed
@bekadavis9 bekadavis9 deleted the rd/vfs_refactor-vfs branch July 15, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants