Skip to content

storage: implement dataflux fast listing #10731

Closed
@akansha1812

Description

@akansha1812

To list large dataset in a GCS bucket sequential it takes a long time. If we can list objects in parallel, it will be much faster to complete listing.

Dataflux fast-listing will be used to list objects in a bucket in parallel using worksteal algorithm. It supports storage.Query to filter objects in a bucket and returns objects in batches. User can provide bucket, storage.Query and number of parallel worker and batch size.

There are different implementation for worksteal algorithm done and after benchmarking those, dataflux implementation came out faster.

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the Cloud Storage API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions