Skip to content

add minio provider #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add minio provider #14

wants to merge 1 commit into from

Conversation

jiuker
Copy link

@jiuker jiuker commented Jul 7, 2025

Summary

add minio provider.

Details

Describe your changes. You can be more detailed and descriptive here.

Usage

You can potentially add a usage example below.

# Add a code snippet demonstrating how to use this.

Before your PR is "Ready for review"

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Did you add notable changes to the client (i.e. not related to tooling, CI/CD, etc.) from this PR to .release_notes/.unreleased.md?

Additional Information

  • Related to issue link (an issue is needed to notify us on Slack).

@jiuker jiuker marked this pull request as draft July 7, 2025 09:22
@jiuker
Copy link
Author

jiuker commented Jul 7, 2025

Will add a blog to min.io web site. When it's done, this pr will be ready.

@commiterate
Copy link
Collaborator

commiterate commented Jul 7, 2025

#12 (comment)

The idea was less that we maintain the full list of S3-compatible stores and more documenting that we support the general class of S3-compatible stores.

If we have to create a separate storage provider that pre-configures the S3 storage provider to disable things like S3 integrity checks (boto/boto3#4392), those will be documented in the Sphinx docs.

For example, we already have these for SwiftStack (s8k) and GCS's S3 API (gcs_s3).

@jiuker jiuker force-pushed the format-the-readme branch from 520d5a1 to e129223 Compare July 8, 2025 01:42
@jiuker jiuker changed the title format the readme add minio provider Jul 8, 2025
@jiuker jiuker marked this pull request as ready for review July 8, 2025 01:44
@jiuker jiuker force-pushed the format-the-readme branch from 81a72ab to ce30806 Compare July 8, 2025 02:18
@jiuker
Copy link
Author

jiuker commented Jul 9, 2025

@commiterate sorry, no format. Could you retry the pipeline?

@commiterate
Copy link
Collaborator

commiterate commented Jul 9, 2025

Looks like linting failed. A just build in a Nix shell (see CONTRIBUTING.md) should check everything locally just like in CI.

@jiuker
Copy link
Author

jiuker commented Jul 10, 2025

Looks like linting failed. A just build in a Nix shell (see CONTRIBUTING.md) should check everything locally just like in CI.

Should be fine now.

---------- coverage: platform linux, python 3.12.3-final-0 -----------
Name                                                                       Stmts   Miss Branch BrPart  Cover
------------------------------------------------------------------------------------------------------------
src/multistorageclient/__init__.py                                            14      0      2      0   100%
src/multistorageclient/cache.py                                               19      1      6      1    92%
src/multistorageclient/caching/__init__.py                                     5      0      0      0   100%
src/multistorageclient/caching/cache_backend.py                              425    169     94      7    58%
src/multistorageclient/caching/cache_config.py                                50     11      2      1    77%
src/multistorageclient/caching/cache_item.py                                  25      6      2      0    70%
src/multistorageclient/caching/eviction_policy.py                             43      4      6      2    88%
src/multistorageclient/client.py                                             321     12    132     14    94%
src/multistorageclient/commands/__init__.py                                    0      0      0      0   100%
src/multistorageclient/commands/cli/__init__.py                                0      0      0      0   100%
src/multistorageclient/commands/cli/actions/__init__.py                        7      7      0      0     0%
src/multistorageclient/commands/cli/actions/action.py                         49     49      4      0     0%
src/multistorageclient/commands/cli/actions/glob.py                           41     41     14      0     0%
src/multistorageclient/commands/cli/actions/help.py                           25     25      4      0     0%
src/multistorageclient/commands/cli/actions/ls.py                             69     69     22      0     0%
src/multistorageclient/commands/cli/actions/rm.py                             45     45     12      0     0%
src/multistorageclient/commands/cli/actions/sync.py                           28     28      4      0     0%
src/multistorageclient/commands/cli/main.py                                   44     44      6      0     0%
src/multistorageclient/commands/msc_benchmark.py                             141    141     24      0     0%
src/multistorageclient/config.py                                             458     42    172     29    88%
src/multistorageclient/constants.py                                            1      0      0      0   100%
src/multistorageclient/contrib/__init__.py                                     0      0      0      0   100%
src/multistorageclient/contrib/async_fs.py                                    76      2      6      2    95%
src/multistorageclient/contrib/numpy.py                                       48      5     26      5    86%
src/multistorageclient/contrib/os/__init__.py                                  6      0      0      0   100%
src/multistorageclient/contrib/os/path.py                                     11      0      0      0   100%
src/multistorageclient/contrib/pickle.py                                      22      0      8      0   100%
src/multistorageclient/contrib/torch/__init__.py                               3      0      0      0   100%
src/multistorageclient/contrib/torch/core.py                                  21      3      8      2    83%
src/multistorageclient/contrib/torch/filesystem.py                            71      7      6      1    90%
src/multistorageclient/contrib/xarray.py                                      17      1      4      1    90%
src/multistorageclient/contrib/zarr.py                                        63     18     12      2    68%
src/multistorageclient/file.py                                               440     68    118     22    82%
src/multistorageclient/generators/__init__.py                                  2      0      0      0   100%
src/multistorageclient/generators/manifest_metadata.py                        32     12     12      1    57%
src/multistorageclient/instrumentation/__init__.py                           110     49     26      6    51%
src/multistorageclient/instrumentation/auth.py                                62     14     12      2    76%
src/multistorageclient/instrumentation/error_aware_processor.py               36      4     18      4    85%
src/multistorageclient/instrumentation/utils.py                              243     19     60      9    91%
src/multistorageclient/pathlib.py                                            306     66    100     16    75%
src/multistorageclient/progress_bar.py                                        34     11     14      4    56%
src/multistorageclient/providers/__init__.py                                  26      7      4      1    73%
src/multistorageclient/providers/ais.py                                      159    112     36      1    25%
src/multistorageclient/providers/azure.py                                    221     18     60      9    90%
src/multistorageclient/providers/base.py                                     181     11     32      2    94%
src/multistorageclient/providers/gcs.py                                      298     72     90     15    74%
src/multistorageclient/providers/gcs_s3.py                                     8      0      0      0   100%
src/multistorageclient/providers/manifest_metadata.py                        192     14     64     13    89%
src/multistorageclient/providers/minio.py                                      8      0      0      0   100%
src/multistorageclient/providers/oci.py                                      223    176     70      0    16%
src/multistorageclient/providers/posix_file.py                               192     20     62      6    88%
src/multistorageclient/providers/s3.py                                       344     47    124     17    84%
src/multistorageclient/providers/s8k.py                                        9      0      0      0   100%
src/multistorageclient/rclone.py                                             113     40     40     11    63%
src/multistorageclient/retry.py                                               28      0      6      1    97%
src/multistorageclient/rust_utils.py                                          13      2      2      1    80%
src/multistorageclient/schema.py                                              14      0      0      0   100%
src/multistorageclient/shortcuts.py                                          111      4     30      4    94%
src/multistorageclient/telemetry/__init__.py                                 196     22     38     10    86%
src/multistorageclient/telemetry/attributes/__init__.py                        0      0      0      0   100%
src/multistorageclient/telemetry/attributes/base.py                           13      1      4      1    88%
src/multistorageclient/telemetry/attributes/environment_variables.py          10      0      0      0   100%
src/multistorageclient/telemetry/attributes/host.py                           16      0      2      1    94%
src/multistorageclient/telemetry/attributes/msc_config.py                     22      0      0      0   100%
src/multistorageclient/telemetry/attributes/process.py                        14      0      0      0   100%
src/multistorageclient/telemetry/attributes/static.py                          9      0      0      0   100%
src/multistorageclient/telemetry/attributes/thread.py                         15      0      0      0   100%
src/multistorageclient/telemetry/metrics/__init__.py                           0      0      0      0   100%
src/multistorageclient/telemetry/metrics/exporters/__init__.py                 0      0      0      0   100%
src/multistorageclient/telemetry/metrics/exporters/otlp_msal.py               31     31      4      0     0%
src/multistorageclient/telemetry/metrics/readers/__init__.py                   0      0      0      0   100%
src/multistorageclient/telemetry/metrics/readers/diperiodic_exporting.py     118     15     24      6    85%
src/multistorageclient/types.py                                              153     28      8      3    81%
src/multistorageclient/utils.py                                              164     12     72      6    92%
src/multistorageclient_rust/__init__.py                                        2      0      0      0   100%
------------------------------------------------------------------------------------------------------------
TOTAL                                                                       6316   1605   1708    239    73%
Coverage HTML written to dir .reports/unit/coverage
Coverage XML written to file .reports/unit/coverage.xml

================================================================================================================= slowest durations ==================================================================================================================
121.45s call     tests/test_multistorageclient/unit/test_config.py::test_oci_storage_provider_passthrough_options
41.45s call     tests/test_multistorageclient/unit/test_sync.py::test_sync_function[TemporaryAWSS3Bucket-sync_kwargs2]
36.58s call     tests/test_multistorageclient/unit/test_sync.py::test_sync_function[TemporaryAWSS3Bucket-sync_kwargs0]
32.08s call     tests/test_multistorageclient/unit/test_cli.py::test_rm_command
28.81s call     tests/test_multistorageclient/unit/test_sync.py::test_sync_function[TemporaryAWSS3Bucket-sync_kwargs1]
20.82s call     tests/test_multistorageclient/unit/test_cli.py::test_ls_command_without_attribute_filter_expression
14.60s call     tests/test_multistorageclient/unit/providers/test_storage_providers.py::test_storage_providers[TemporaryGoogleCloudStorageS3Bucket-False]
14.00s call     tests/test_multistorageclient/unit/test_cli.py::test_ls_command_with_attribute_filter_expression
12.61s call     tests/test_multistorageclient/unit/test_cli.py::test_glob_command_without_attribute_filter_expression
12.58s call     tests/test_multistorageclient/unit/test_cli.py::test_attribute_filter_expression_parsing_errors
11.52s call     tests/test_multistorageclient/unit/test_shortcuts.py::test_msc_shortcuts_with_empty_base_path[TemporaryAWSS3Bucket]
10.88s call     tests/test_multistorageclient/unit/test_telemetry.py::test_telemetry_proxy_objects[fork]
10.41s call     tests/test_multistorageclient/unit/contrib/test_fsspec.py::test_fsspec_implementation[TemporaryAWSS3Bucket]
10.34s call     tests/test_multistorageclient/unit/test_cli.py::test_glob_command_with_attribute_filter_expression

(789 durations < 10s hidden.  Use -vv to show these durations.)
============================================================================================= 267 passed, 1 skipped, 2563 warnings in 210.67s (0:03:30) ==============================================================================================
# Stop storage systems.
#
# Azurite's process commands are `node` instead of `azurite`. Find by port instead.
for PID in $(lsof -i :10000-10002 -c fake-gcs-server -c minio -t); do kill -s KILL $PID; done
# Remove sandbox directories.
rm -rf .{azurite,fake-gcs-server,minio}/sandbox
bash: line 1: 76433 Killed                  TZ="UTC" fake-gcs-server -backend memory -log-level error -scheme http
bash: line 1: 76415 Killed                  azurite --inMemoryPersistence --silent --skipApiVersionCheck
bash: line 1: 76434 Killed                  minio server --config ../minio.yaml --quiet
# Remove package archives.
rm -rf dist
# Create a source distribution.
uv build --sdist
warning: The `requires-python` specifier (`~=3.9`) in `multi-storage-client` uses the tilde specifier (`~=`) without a patch version. This will be interpreted as `>=3.9, <4`. Did you mean `~=3.9.0` to constrain the version as `>=3.9.0, <3.10`? We recommend only using the tilde specifier with a patch version to avoid ambiguity.
Building source distribution...
Running `maturin pep517 write-sdist --sdist-directory /mnt/d/workspace/go/src/multi-storage-client/dist`
warning: both `/root/.cargo/config` and `/root/.cargo/config.toml` exist. Using `/root/.cargo/config`
📦 Including license file `LICENSE`
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings with abi3 support
📡 Using build options features from pyproject.toml
From `cargo package --list --allow-dirty --manifest-path /mnt/d/workspace/go/src/multi-storage-client/rust/Cargo.toml`:
warning: both `/root/.cargo/config` and `/root/.cargo/config.toml` exist. Using `/root/.cargo/config`
warning: manifest has no description, license, license-file, documentation, homepage or repository.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
📦 Built source distribution to /mnt/d/workspace/go/src/multi-storage-client/dist/multi_storage_client-0.24.0.tar.gz
multi_storage_client-0.24.0.tar.gz
Successfully built dist/multi_storage_client-0.24.0.tar.gz
# Create platform-specific wheels.
# Link Apple SDKs for cross-compilation, setup environment variables to correct wheel names.
# https://github.com/rust-cross/cargo-zigbuild#caveats
# https://github.com/PyO3/maturin/discussions/2586#discussioncomment-13095109
# https://github.com/PyO3/maturin/blob/d95faa64f2c9971820314d228da9a7e71d2e4b87/src/build_context.rs#L1160
for TARGET in aarch64-apple-darwin aarch64-unknown-linux-gnu x86_64-apple-darwin x86_64-unknown-linux-gnu; do if [ "$TARGET" = "aarch64-apple-darwin" ]; then MACOSX_DEPLOYMENT_TARGET=$APPLE_SDK_VERSION_AARCH64; SDKROOT=$APPLE_SDK_AARCH64; elif [ 
"$TARGET" = "x86_64-apple-darwin" ]; then MACOSX_DEPLOYMENT_TARGET=$APPLE_SDK_VERSION_X86_64; SDKROOT=$APPLE_SDK_X86_64; else MACOSX_DEPLOYMENT_TARGET=""; SDKROOT=""; fi; env --unset _PYTHON_HOST_PLATFORM MACOSX_DEPLOYMENT_TARGET=$MACOSX_DEPLOYMENT_TARGET SDKROOT=$SDKROOT uv run maturin build --out dist --release --target $TARGET --zig; done
warning: The `requires-python` specifier (`~=3.9`) in `multi-storage-client` uses the tilde specifier (`~=`) without a patch version. This will be interpreted as `>=3.9, <4`. Did you mean `~=3.9.0` to constrain the version as `>=3.9.0, <3.10`? We recommend only using the tilde specifier with a patch version to avoid ambiguity.
warning: both `/root/.cargo/config` and `/root/.cargo/config.toml` exist. Using `/root/.cargo/config`
📦 Including license file "/mnt/d/workspace/go/src/multi-storage-client/LICENSE"
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.9
🐍 Not using a specific python interpreter
📡 Using build options features from pyproject.toml
🛠️ Using zig for cross-compiling to aarch64-apple-darwin
warning: both `/root/.cargo/config` and `/root/.cargo/config.toml` exist. Using `/root/.cargo/config`
warning: unused manifest key: lib.publish
   Compiling cfg-if v1.0.0
   Compiling smallvec v1.14.0
   Compiling pin-project-lite v0.2.16
 ...

@jiuker
Copy link
Author

jiuker commented Jul 10, 2025

@commiterate Looks like something to my python. Could you retry?

@jiuker jiuker requested review from jeking3 and commiterate July 11, 2025 01:29
add minio provider
@jiuker jiuker force-pushed the format-the-readme branch from 378fbe1 to 675411b Compare July 11, 2025 01:37
@commiterate commiterate removed the request for review from jeking3 July 11, 2025 15:51
@jiuker
Copy link
Author

jiuker commented Jul 15, 2025

@commiterate @jeking3
Looks like it don't releated this pr.

=================================== FAILURES ===================================
________________________________ test_path_glob ________________________________

file_storage_config = '/tmp/nix-shell.ne6Eei/msc_config-754f871b6e3f4c6ea16434b618f980d9.yaml'

    def test_path_glob(file_storage_config):
        with tempfile.TemporaryDirectory() as temp_dir:
            create_file(msc.Path(f"{temp_dir}/dir1/testfile.txt"))
            create_file(msc.Path(f"{temp_dir}/dir1/dir2/testfile.txt"))
            create_file(msc.Path(f"{temp_dir}/dir1/dir3/testfile.txt"))
    
            path = msc.Path(f"{temp_dir}/")
            assert list(path.glob("*")) == [msc.Path(f"{temp_dir}/dir1")]
>           assert list(path.glob("dir1/*")) == [
                msc.Path(f"{temp_dir}/dir1/dir2"),
                msc.Path(f"{temp_dir}/dir1/dir3"),
                msc.Path(f"{temp_dir}/dir1/testfile.txt"),
            ]
E           AssertionError: assert [MultiStorage...estfile.txt')] == [MultiStorage...estfile.txt')]
E             
E             At index 0 diff: MultiStoragePath('/tmp/nix-shell.ne6Eei/tmp17r1aou_/dir1/dir3') != MultiStoragePath('/tmp/nix-shell.ne6Eei/tmp17r1aou_/dir1/dir2')
E             Use -v to get more diff

tests/test_multistorageclient/unit/test_pathlib.py:148: AssertionError
=============================== warnings summary ===============================
tests/test_multistorageclient/unit/test_cache.py:638
  /home/runner/work/multi-storage-client/multi-storage-client/tests/test_multistorageclient/unit/test_cache.py:638: PytestRemovedIn9Warning: Marks applied to fixtures have no effect
  See docs: https://docs.pytest.org/en/stable/deprecations.html#applying-a-mark-to-a-fixture-function
    def no_eviction_cache_config(tmpdir):

tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryPOSIXDirectory]
tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryAWSS3Bucket]
tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryAzureBlobStorageContainer]
tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryGoogleCloudStorageBucket]
tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporarySwiftStackBucket]
  /home/runner/work/multi-storage-client/multi-storage-client/.venv/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py:167: UserWarning: torch.distributed is disabled, unavailable or uninitialized, assuming the intent is to save in a single process.
    warnings.warn(

tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryPOSIXDirectory]
  /home/runner/work/multi-storage-client/multi-storage-client/.venv/lib/python3.10/site-packages/torch/distributed/checkpoint/filesystem.py:111: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
    if tensor.storage().size() != tensor.numel():

tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryPOSIXDirectory]
tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryAWSS3Bucket]
tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryAzureBlobStorageContainer]
tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporaryGoogleCloudStorageBucket]
tests/test_multistorageclient/unit/contrib/test_torch.py::test_filesystem_reader_writer[TemporarySwiftStackBucket]
  /home/runner/work/multi-storage-client/multi-storage-client/.venv/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_loader.py:153: UserWarning: torch.distributed is disabled, unavailable or uninitialized, assuming the intent is to load in a single process.
    warnings.warn(

tests/test_multistorageclient/unit/contrib/test_torch.py: 2 warnings
tests/test_multistorageclient/unit/generators/test_manifest_metadata.py: 6 warnings
tests/test_multistorageclient/unit/providers/test_storage_providers.py: 2 warnings
tests/test_multistorageclient/unit/test_file.py: 1 warning
  /home/runner/work/multi-storage-client/multi-storage-client/.venv/lib/python3.10/site-packages/azure/storage/blob/_container_client.py:1182: DeprecationWarning: The use of a 'BlobProperties' instance for param blob is deprecated. Please use 'BlobProperties.name' or any other str input type instead.
    warnings.warn(

tests/test_multistorageclient/unit/contrib/test_torch.py: 2 warnings
tests/test_multistorageclient/unit/generators/test_manifest_metadata.py: 6 warnings
tests/test_multistorageclient/unit/providers/test_storage_providers.py: 2 warnings
tests/test_multistorageclient/unit/test_file.py: 1 warning
  /home/runner/work/multi-storage-client/multi-storage-client/.venv/lib/python3.10/site-packages/azure/storage/blob/_container_client.py:1602: DeprecationWarning: The use of a 'BlobProperties' instance for param blob is deprecated. Please use 'BlobProperties.name' or any other str input type instead.
    warnings.warn(

tests/test_multistorageclient/unit/test_config.py::test_ais_storage_provider_passthrough_options
  /home/runner/work/multi-storage-client/multi-storage-client/src/multistorageclient/providers/ais.py:156: DeprecationWarning: 'retry' is deprecated and will be removed in a future release. Use 'retry_config' instead.
    self.client = Client(endpoint=endpoint, retry=client_retry)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
- generated xml file: /home/runner/work/multi-storage-client/multi-storage-client/.reports/unit/pytest.xml -

@jiuker jiuker requested a review from jeking3 July 15, 2025 00:55
@commiterate commiterate removed the request for review from jeking3 July 15, 2025 05:32
@commiterate
Copy link
Collaborator

Looks like some flakiness with the order glob returns stuff on the GitHub Actions runners.


Discussing with the team right now to see if a separate minio storage provider should exist or if a documentation change stating the existing s3 storage provider should work with S3-compatible APIs is sufficient.

@jiuker
Copy link
Author

jiuker commented Jul 18, 2025

any update? @commiterate

@pradeep-mj
Copy link
Collaborator

any update? @commiterate

Hi @jiuker, I would like to discuss this with you and someone who can represent MinIO in this regard. Please send me email at [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants