You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this case, 2 users on different OSes would have the same split_idx, but different train/validation sets.
It would be straightforward for a user to correct this by sorting items before passing this list into the splitter, but I wouldn't expect that many people would know to do this.
The text was updated successfully, but these errors were encountered:
I had an issue with matching a sorted CSV file with labels, to files from folder, only to find that get_image_files was returning in arbitrary non sorted order.
Also think that it's good practice to sort by default so it's deterministic (and then shuffle when needed).
It may be possible to introduce some sorting criteria, but I'm wondering whether sorting everything alphabetically could be a problem. Maybe they can be sorted based on a hash function on the filename?
EDIT: it would also be possible to sort, then shuffle randomly. The order would be deterministic if the random seed is set.
I originally thought this is only an issue when using different OSes, but I now noticed that it returns a different order on 2 different Ubuntu systems.
_get_files
in local.data.transforms.py doesn't return files in a deterministic order across OSes.This is an issue when getting files, then splitting using a fixed seed. For example, in 08_pets_tutorial.ipynb (I added the seed parameter):
In this case, 2 users on different OSes would have the same
split_idx
, but different train/validation sets.It would be straightforward for a user to correct this by sorting
items
before passing this list into the splitter, but I wouldn't expect that many people would know to do this.The text was updated successfully, but these errors were encountered: