-
Notifications
You must be signed in to change notification settings - Fork 162
Feat: Introduce ExtendedGcsFileSystem for Zonal Bucket gRPC Read Path #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…e developed - experimental_zb_hns_support
Extend gcsfs to create new filesystem. async download
Merged GCSFileSystemAdapter and ZonalFile POC
Rename gcshnsfilesystem to GCSFileSystemAdapter
Feat: Implement gRPC Read Path for Zonal Bucket
…t being used. Updated zonal tests to work with both real and fake gcs Added more tests for zonal path reads
made private: _is_zonal_bucket and _process_limits_to_offset_and_length fix process method to handle negative length add unit test for _process_limits_to_offset_and_length
Add test for failure scenarios in mrd
Feat: Zonal Read optimizations and test suite
Internal main to main
* Add mrd closing logic in ZonalFile and GCSFSAdapter Add exception handling in GCSFile fetch_range * Fix gcs_adapter tests to use zonal mocks add test to validate mrd stream is closed with GCSFile closing * FIx: use patch for auth in gcs_adapter fixture for fake gcs only
martindurant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary comments, not yet having looked through the main code of "adapter"
| def zonal_mocks(): | ||
| """A factory fixture for mocking Zonal bucket functionality.""" | ||
|
|
||
| @contextlib.contextmanager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make a context in a fixture, which already acts like a context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner function allows us to pass different file_data parameters to the mock for each specific call which fetches the correct rvalue for assertion. Also, Some tests has write operations. Since Zonal currently supports reads only, the context manager helps to apply the zonal mocks only during the read portion of the test. If the mocks were active for the entire test duration, the write operations would fail. This seemed the best way to achieve both of the requirements without duplicating mock setup code across every test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, following up on this. The context manager lets us cleanly scope the mocks to the read phase of the test. Does this approach seem reasonable, or should we switch to adding separate mocks for the tests with write?
gcsfs/tests/test_extended_gcsfs.py
Outdated
| ) | ||
|
|
||
| mock_create_mrd = mock.AsyncMock(return_value=mock_downloader) | ||
| with mock.patch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow all this patching. The emulator doesn't do zonal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does not. It supports older apis only and does not support new storage type apis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've raised a feature request for same: fsouza/fake-gcs-server#2069
…est fixtures and pytest treats this as error on ci
|
What I am suggesting, is that import of the adapter modure should call The environment variable would be used like: if os.getenv(..):
import gcsfs.adapterin For the sake of tests, you could either make a fixture that ensures the right class is registered for each test function/module, or make a separate CI run with the environment variable set. |
…Test Mocking (#8) * Add mrd closing logic in ZonalFile and GCSFSAdapter Add exception handling in GCSFile fetch_range * Fix gcs_adapter tests to use zonal mocks add test to validate mrd stream is closed with GCSFile closing * FIx: use patch for auth in gcs_adapter fixture for fake gcs only * Change imports to absolute imports * Rename GcsFileSystemAdapter to ExtendedGcsFileSystem * Mock _sync_get_bucket_type call to return UNKNOWN while opening a file in write mode * Updated ExtendedGcsFileSystem class description Separated cleanup logic from extended_gcsfs fixture for better readability
* registers ExtendedGCSFileSystem when experimental variable is set instead of __new__ hack * fixes registry race condition * removes unnecessary registry * adds separate ci tests * adds test_init.py * fixes pre-commit
removed redundant environment variable setting removed try block for creating TEST_BUCKET to expose errors in test setup
This pull request introduces base skeleton for adding different types of storages and would be behind experimental flag until full support for new storage types is added.
Key Changes:
GCSFileSystemAdapter: Addedgcsfs/gcsfs_adapter.pyto subclassGCSFileSystem. This adapter dynamically handles different bucket types using gcs storage layout api, directing operations to the appropriate file implementation.gRPC Integration: Integrated the asynchronous gRPC client within the adapter and ZonalFile.ZonalFile: Created gcsfs/zonal_file.py as a subclass of GCSFile. This class is specifically designed for interacting with GCS Zonal Buckets. The existingGCSFilefromgcsfs.corecontinues to be used for all non-zonal bucket operations within the adapter, ensuring backward compatibility for standard bucket types.Experimental Flag: The new Zonal Bucket/HNS features will only be enabled via anexperimental_zb_hns_supportflag passed to theGCSFileSystemconstructor. The existinggcsfs.core.GCSFileSystemremains the foundation. When the experimental_zb_hns_support flag is not passed or is False, an instance of GCSFileSystem is created as usual, retaining all original functionality and read/write methods.Utilities: Addedgcsfs/zb_hns_utils.pyfor helper functions for gcsfs_adapter and zonal file.Testing: Unit tests are added corresponding to newly added files.