Skip to content

Conversation

@ankitaluthra1
Copy link
Contributor

@ankitaluthra1 ankitaluthra1 commented Nov 2, 2025

This pull request introduces base skeleton for adding different types of storages and would be behind experimental flag until full support for new storage types is added.

Key Changes:

  • GCSFileSystemAdapter: Added gcsfs/gcsfs_adapter.py to subclass GCSFileSystem. This adapter dynamically handles different bucket types using gcs storage layout api, directing operations to the appropriate file implementation.
  • gRPC Integration: Integrated the asynchronous gRPC client within the adapter and ZonalFile.
  • ZonalFile: Created gcsfs/zonal_file.py as a subclass of GCSFile. This class is specifically designed for interacting with GCS Zonal Buckets. The existing GCSFile from gcsfs.core continues to be used for all non-zonal bucket operations within the adapter, ensuring backward compatibility for standard bucket types.
  • Experimental Flag: The new Zonal Bucket/HNS features will only be enabled via an experimental_zb_hns_support flag passed to the GCSFileSystem constructor. The existing gcsfs.core.GCSFileSystem remains the foundation. When the experimental_zb_hns_support flag is not passed or is False, an instance of GCSFileSystem is created as usual, retaining all original functionality and read/write methods.
  • Utilities: Added gcsfs/zb_hns_utils.py for helper functions for gcsfs_adapter and zonal file.
  • Testing: Unit tests are added corresponding to newly added files.

ankitaluthra1 and others added 30 commits October 1, 2025 06:27
Extend gcsfs to create new filesystem. async download
Merged GCSFileSystemAdapter and ZonalFile POC
Rename gcshnsfilesystem to GCSFileSystemAdapter
Feat: Implement gRPC Read Path for Zonal Bucket
…t being used.

Updated zonal tests to work with both real and fake gcs
Added more tests for zonal path reads
ankitaluthra1 and others added 19 commits October 29, 2025 18:38
made private: _is_zonal_bucket and _process_limits_to_offset_and_length
fix process method to handle negative length
add unit test for _process_limits_to_offset_and_length
Feat: Zonal Read optimizations and test suite
* Add mrd closing logic in ZonalFile and GCSFSAdapter
Add exception handling in GCSFile fetch_range

* Fix gcs_adapter tests to use zonal mocks
add test to validate mrd stream is closed with GCSFile closing

* FIx: use patch for auth in gcs_adapter fixture for fake gcs only
Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary comments, not yet having looked through the main code of "adapter"

def zonal_mocks():
"""A factory fixture for mocking Zonal bucket functionality."""

@contextlib.contextmanager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make a context in a fixture, which already acts like a context?

Copy link

@suni72 suni72 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner function allows us to pass different file_data parameters to the mock for each specific call which fetches the correct rvalue for assertion. Also, Some tests has write operations. Since Zonal currently supports reads only, the context manager helps to apply the zonal mocks only during the read portion of the test. If the mocks were active for the entire test duration, the write operations would fail. This seemed the best way to achieve both of the requirements without duplicating mock setup code across every test.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, following up on this. The context manager lets us cleanly scope the mocks to the read phase of the test. Does this approach seem reasonable, or should we switch to adding separate mocks for the tests with write?

)

mock_create_mrd = mock.AsyncMock(return_value=mock_downloader)
with mock.patch(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow all this patching. The emulator doesn't do zonal?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does not. It supports older apis only and does not support new storage type apis

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've raised a feature request for same: fsouza/fake-gcs-server#2069

@martindurant
Copy link
Member

What I am suggesting, is that import of the adapter modure should call register_implementation with the new class, replacing GCSFileSystem in the registry.

The environment variable would be used like:

if os.getenv(..):
    import gcsfs.adapter

in __init__.

For the sake of tests, you could either make a fixture that ensures the right class is registered for each test function/module, or make a separate CI run with the environment variable set.

…Test Mocking (#8)

* Add mrd closing logic in ZonalFile and GCSFSAdapter
Add exception handling in GCSFile fetch_range

* Fix gcs_adapter tests to use zonal mocks
add test to validate mrd stream is closed with GCSFile closing

* FIx: use patch for auth in gcs_adapter fixture for fake gcs only

* Change imports to absolute imports

* Rename GcsFileSystemAdapter to ExtendedGcsFileSystem

* Mock _sync_get_bucket_type call to return UNKNOWN while opening a file in write mode

* Updated ExtendedGcsFileSystem class description
Separated cleanup logic from extended_gcsfs fixture for better readability
@ankitaluthra1 ankitaluthra1 changed the title Feat: Introduce GCSFileSystemAdapter for Zonal Bucket gRPC Read Path Feat: Introduce ExtendedGcsFileSystem for Zonal Bucket gRPC Read Path Nov 12, 2025
* registers ExtendedGCSFileSystem when experimental variable is set instead of __new__ hack

* fixes registry race condition

* removes unnecessary registry

* adds separate ci tests

* adds test_init.py

* fixes pre-commit
ankitaluthra1 and others added 3 commits November 14, 2025 16:51
removed redundant environment variable setting
removed try block for creating TEST_BUCKET to expose errors in test setup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants