Skip to content

Conversation

@gabrielilharco
Copy link
Collaborator

No description provided.

@gabrielilharco
Copy link
Collaborator Author

@rom1504 here's the PR we were discussing, in case we want to stop trying to guess the length of wds datasets

@usuyama
Copy link

usuyama commented Jan 6, 2023

I like this change! (in some experiments, I wanted to override train_num_samples and shorten the epoch - this change will solve the issue, too)

With webdataset, when setting train_num_samples as smaller than whole dataset, do we get different set of samples per epoch?

@gabrielilharco
Copy link
Collaborator Author

@usuyama with the current code, you can set --train-num-samples to a smaller number, and if you use --dataset-resampled, you'll always get random samples from the entire pool on each "epoch".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants