First Webdataset compatible release
[feat] Webdataset support (#111)
* Better error messages on http path
- async tarball pull, but behavior is clunky
- general arch could be simpler and using tokio more
- handling jpg/png/jpeg/cls/txt/json types
- some shuffling handling
missing unit tests, and better behavior, doing pauses at the moment
better documentation
big rewrite, nicer and smaller code I believe (#117)
Co-authored-by: Benjamin Lefaudeux <[email protected]>
Async tarball pull and dispatch
Random_sampling in the config, at least for now. Thanks for the review Roman !
* Code review (#120)
Some missing items (would be good to propagate the archive name for instance), but most fixes should be there
* second round, hoopefully good to go. Perf could probably be improved, competing sample pull
* handling multi image samples (#121)
bugfixing the previous PR, ideally we should unit test more
* final update round
* second review, not perfect but feels like we can land this and carry on
---------
Co-authored-by: Benjamin Lefaudeux <[email protected]>