storage: implement dataflux fast listing #10731
Labels
api: storage
Issues related to the Cloud Storage API.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
To list large dataset in a GCS bucket sequential it takes a long time. If we can list objects in parallel, it will be much faster to complete listing.
Dataflux fast-listing will be used to list objects in a bucket in parallel using worksteal algorithm. It supports storage.Query to filter objects in a bucket and returns objects in batches. User can provide bucket, storage.Query and number of parallel worker and batch size.
There are different implementation for worksteal algorithm done and after benchmarking those, dataflux implementation came out faster.
The text was updated successfully, but these errors were encountered: