FiftyOne Imagenet manually fix issues, return original class mapping #380
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
There are unfortunately well-known issues with ImageNet classes when it comes to duplicate names (https://gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57) in this PR introduce manually fixing duplicate class names, in a similar manner that the COCO parser fixes duplicate images.
It is also not possible as of now to have the original mapping of classes, and after parsing the dataset we are left with the current (re-ordered) mapping, so I added a way of retrieving it.
It is possible that some users would like to work with the datasets even with the underlying issues, for example having duplicate class names. In this case I added the
--no-cleanflag option to parse the original datasets without changes.Specification
Dependencies & Potential Impact
Deployment Plan
Testing & Validation
Tested locally both by me and @ptoupas, I tagged him in this PR too