Performance considerations for Dataset.iter #7511
Unanswered
wittenator
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Other frameworks such as Pytorch let you specify the number of workers for a Dataloader in order to preload batches and keep the GPU utilization high. Are there any experiences with how this works with Huggingface Datasets and the iter method? I am currently using Huggingface Datasets with Jax output and see alternating GPU utilization and a lot of time is spent accessing memory even if I load the dataset completely into memory. I thought that data loading may be one issue at play here. There is a similar issue from two years ago: #6341 , but I am curious if something changed since then.
Beta Was this translation helpful? Give feedback.
All reactions