Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_get_files doesn't return files in a deterministic order across OSes #239

Open
dcato98 opened this issue Oct 5, 2019 · 3 comments
Open

Comments

@dcato98
Copy link
Contributor

dcato98 commented Oct 5, 2019

_get_files in local.data.transforms.py doesn't return files in a deterministic order across OSes.

This is an issue when getting files, then splitting using a fixed seed. For example, in 08_pets_tutorial.ipynb (I added the seed parameter):

items = get_image_files(source)
split_idx = RandomSplitter(seed=42)(items)

In this case, 2 users on different OSes would have the same split_idx, but different train/validation sets.

It would be straightforward for a user to correct this by sorting items before passing this list into the splitter, but I wouldn't expect that many people would know to do this.

@rmkn85
Copy link

rmkn85 commented Oct 14, 2019

I had an issue with matching a sorted CSV file with labels, to files from folder, only to find that get_image_files was returning in arbitrary non sorted order.
Also think that it's good practice to sort by default so it's deterministic (and then shuffle when needed).

@tacchinotacchi
Copy link
Contributor

tacchinotacchi commented Nov 8, 2019

It may be possible to introduce some sorting criteria, but I'm wondering whether sorting everything alphabetically could be a problem. Maybe they can be sorted based on a hash function on the filename?

EDIT: it would also be possible to sort, then shuffle randomly. The order would be deterministic if the random seed is set.

@dcato98
Copy link
Contributor Author

dcato98 commented Nov 25, 2019

I originally thought this is only an issue when using different OSes, but I now noticed that it returns a different order on 2 different Ubuntu systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants