storage: implement dataflux fast listing

To list large dataset in a GCS bucket sequential it takes a long time. If we can list objects in parallel, it will be much faster to complete listing.

Dataflux fast-listing will be used to list objects in a bucket in parallel using worksteal algorithm. It supports storage.Query to filter objects in a bucket and returns objects in batches. User can provide bucket, storage.Query and number of parallel worker and batch size.  

There are different implementation for worksteal algorithm done and after benchmarking those, dataflux implementation came out faster. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

storage: implement dataflux fast listing #10731

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

storage: implement dataflux fast listing #10731

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions