You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be useful to be able to straightforwardly repeat iterable datasets indefinitely, to provide complete control over starting and ending of iteration to the user.
An IterableDataset.repeat(n) function could do this automatically
Motivation
This feature was discussed in this issue #7147, and would resolve the need to use the hack of interleave datasets with probability 0 as a simple way to achieve this functionality.
An additional benefit might be the simplification of the use of iterable datasets in a distributed setting:
If the user can assume that datasets will repeat indefinitely, then issues around different numbers of samples appearing on different devices (e.g. #6437, #6594, #6623, #6719) can potentially be straightforwardly resolved by simply doing:
ids.repeat(None).take(n_samples_per_epoch)
Your contribution
I'm not familiar enough with the codebase to assess how straightforward this would be to implement.
If it might be very straightforward, I could possibly have a go.
The text was updated successfully, but these errors were encountered:
concatenate_datasets does the job when there is a finite number of repetitions, but in case of .repeat() forever we need a new logic in iterable_dataset.py
Feature request
It would be useful to be able to straightforwardly repeat iterable datasets indefinitely, to provide complete control over starting and ending of iteration to the user.
An IterableDataset.repeat(n) function could do this automatically
Motivation
This feature was discussed in this issue #7147, and would resolve the need to use the hack of interleave datasets with probability 0 as a simple way to achieve this functionality.
An additional benefit might be the simplification of the use of iterable datasets in a distributed setting:
If the user can assume that datasets will repeat indefinitely, then issues around different numbers of samples appearing on different devices (e.g. #6437, #6594, #6623, #6719) can potentially be straightforwardly resolved by simply doing:
ids.repeat(None).take(n_samples_per_epoch)
Your contribution
I'm not familiar enough with the codebase to assess how straightforward this would be to implement.
If it might be very straightforward, I could possibly have a go.
The text was updated successfully, but these errors were encountered: