Creating a HF Dataset from lakeFS with S3 storage takes too much time!

Hi,

I’m new to HF dataset and I tried to create datasets based on data versioned in **lakeFS** _(**MinIO** S3 bucket as storage backend)_

Here I’m using ±30000 PIL image from MNIST data however it is taking around 12min to execute, which is a lot!

From what I understand, it is loading the images into cache then building the dataset.
– Please find bellow the execution screenshot –

Is there a way to optimize this or am I doing something wrong?

Thanks!

![Image](https://github.com/user-attachments/assets/c79257c8-f023-42a9-9e6f-0898b3ea93fe)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Creating a HF Dataset from lakeFS with S3 storage takes too much time! #7627

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Creating a HF Dataset from lakeFS with S3 storage takes too much time! #7627

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions