-
Notifications
You must be signed in to change notification settings - Fork 67
Cycle option for StreamingDataLoader #524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You could check PyTorch Lightning Cycle Loaders: https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.utilities.combined_loader.html Or create your own wrapper that iterates for a given number of steps. |
Hi @Aceticia, ParallelStreamingDataset #576 has just been merged into |
Hi @philgzl, are you suggesting something like: sd = ld.StreamingDataset("..", cycle=True) and so when iter raises StopIteration, we don't increase epoch count, and just restart iter? |
Mmh I was thinking of a similar solution to what was implemented in
Iterating over the dataset once then yields 100 samples. If the dataset has less than 100 samples, we cycle and shuffle internally. If we iterate over the dataset a second time, we resume from where we left off without re-shuffling, and yield 100 samples again. This way we can disentangle the epoch length (as in the number of items yielded by |
I realize now maybe what you meant with |
thanks for the clarification. Similar to parallelSD, pass int or "inf". Sounds good to me. |
Yes and then I guess this feature should be removed from |
🚀 Feature
A function or an argument in StreamingDataLoader to cycle the passed in StreamingDataset.
Motivation
Many training scenarios in CV involve training models with multiple epochs, while wanting to control the exact number of steps being trained, independent of the underlying dataset size. E.g., given a CombinedStreamingDataset of some length, restart its iterations when it is exhausted.
Pitch
I'm not quite sure how this should be done - maybe in iter method of StreamingDataLoader, we can catch the final iteration and restart it?
The text was updated successfully, but these errors were encountered: