Closed
Description
To list large dataset in a GCS bucket sequential it takes a long time. If we can list objects in parallel, it will be much faster to complete listing.
Dataflux fast-listing will be used to list objects in a bucket in parallel using worksteal algorithm. It supports storage.Query to filter objects in a bucket and returns objects in batches. User can provide bucket, storage.Query and number of parallel worker and batch size.
There are different implementation for worksteal algorithm done and after benchmarking those, dataflux implementation came out faster.