Skip to content

First Webdataset compatible release

Compare
Choose a tag to compare
@blefaudeux blefaudeux released this 02 Jun 12:02
· 7 commits to main since this release
9352623
[feat] Webdataset support (#111)

* Better error messages on http path

- async tarball pull, but behavior is clunky
- general arch could be simpler and using tokio more
- handling jpg/png/jpeg/cls/txt/json types
- some shuffling handling

missing unit tests, and better behavior, doing pauses at the moment

better documentation

big rewrite, nicer and smaller code I believe (#117)

Co-authored-by: Benjamin Lefaudeux <[email protected]>

Async tarball pull and dispatch

Random_sampling in the config, at least for now. Thanks for the review Roman !

* Code review (#120)

Some missing items (would be good to propagate the archive name for instance), but most fixes should be there

* second round, hoopefully good to go. Perf could probably be improved, competing sample pull

* handling multi image samples (#121)

bugfixing the previous PR, ideally we should unit test more

* final update round

* second review, not perfect but feels like we can land this and carry on

---------

Co-authored-by: Benjamin Lefaudeux <[email protected]>