Releases: Lightning-AI/litdata
v0.2.25
What's Changed
- fix(ci): prune duplicated tests/checks by @Borda in #333
- fix(lint): prune invalid configurations by @Borda in #334
- ci: enable testing
py3.10
& prune unused workflows by @Borda in #335 - bump: use the latest/fixed version of
RequirementCache
by @Borda in #336 - Fix: Ensure Compression Algorithm is Installed Before Reading Compressed Data by @bhimrazy in #342
- Bump: release version 0.2.25 by @bhimrazy in #343
Full Changelog: v0.2.24...v0.2.25
v0.2.24
What's Changed
- Update README.md by @tchaton in #319
- Revert "Feat: Add support for reading LitData dataset published to HF" by @bhimrazy in #320
- Expose max download param by @animan42 in #323
- Dummy unit test max download by @animan42 in #325
- Nitpick: random state best practice by @deependujha in #326
- Ref/minor fixes by @bhimrazy in #329
- Bugfix: inconsistent streaming dataloader state (specific to StreamingDataset) by @bhimrazy in #318
- Bump: release version 0.2.24 by @bhimrazy in #332
New Contributors
Full Changelog: v0.2.23...v0.2.24
v0.2.23
What's Changed
- Update README.md by @tchaton in #303
- Fix StreamingDataset.get_len(num_workers=0) by @senarvi in #311
- Feat: Add support for storing and reading dataset from HF by @bhimrazy in #304
- Speed up the search for chunks to skip deletion for by @awaelchli in #312
- Feat: Clear cache if optimized dataset changes by @deependujha in #308
- Added a test for the bug with data loader length with num_workers=0 by @senarvi in #314
- Bump: release version 0.2.23 by @deependujha in #315
Full Changelog: v0.2.22...v0.2.23
Release 0.2.22
What's Changed
- Add support for passing the start_method to optimize by @tchaton in #298
- Add compression example in the readme by @tchaton in #300
- Enforce passing item_loader when customizing underlying storage format by @tchaton in #296
- Optimization when there is no data to download by @tchaton in #301
- Pre release 0.2.22 by @tchaton in #302
Full Changelog: v0.2.21...v0.2.22
Weekly release 0.2.21
What's Changed
- Remove s5cmd by @robmarkcole in #293
New Contributors
- @robmarkcole made their first contribution in #293
Full Changelog: v0.2.20...v0.2.21
Weekly release 0.2.20
What's Changed
- Fix: Prevent xarray hanging due to lock and wrong multiprocessing default by @tchaton in #284
- Bump mosaicml-streaming from 0.7.6 to 0.8.0 by @dependabot in #289
- Update pytest requirement from ==8.2.* to ==8.3.* by @dependabot in #287
- Reduce uploaders by @tchaton in #290
Full Changelog: v0.2.19...v0.2.20
0.2.19
What's Changed
- Fix: failing tests due to future warning related to torch.loads(weights_only=True) by @deependujha in #272
- support downloading from azure blob storage by @jaehwana2z in #262
- Bump Lightning-AI/utilities from 0.11.5 to 0.11.6 by @dependabot in #274
- resolved downloading data from azure blob storage by @mohanreddypmr in #275
- Fix filename in merging compressed datasets by @bhimrazy in #277
- Bad overriding of thread._delete by @tchaton in #278
- Bump version 0.2.19 by @tchaton in #279
New Contributors
- @jaehwana2z made their first contribution in #262
- @mohanreddypmr made their first contribution in #275
Full Changelog: v0.2.18...v0.2.19
0.2.18
What's Changed
- Always send the rank when broadcasting by @awaelchli in #257
- fix: Handle missing 'encryption' field in legacy dataset by @csy1204 in #259
- Update map() and optimize() documentation by @senarvi in #264
- Correct README.md for CombinedStreamingDataset with proportions by @hiyyg in #266
- Update README.md by @tchaton in #267
- Update README.md by @tchaton in #268
- Update README.md by @tchaton in #269
- Bump version 0.2.18 by @tchaton in #270
New Contributors
Full Changelog: v0.2.17...v0.2.18
v0.2.17
This release contains new features and fixes for distributed training.
Important: This release fixes hangs in distributed training by ensuring the same number of batches are returned on each rank (#237). However, this and other fixes change how samples are assigned to ranks and is therefore a breaking change. Resuming from checkpoints created with an older version of LitData will not be valid (if you are using the stateful data loader feature).
What's Changed
- Feat: Updates readme and a few nitpicks by @deependujha in #223
- docs: add
Specify cache directory
by @csy1204 in #229 - Enable compatibility with Numpy 2.0 by @weiji14 in #230
- Fix typo in resolver.py by @lud-ds in #239
- Feature: Add support for encryption and decryption of data at chunk/sample level by @bhimrazy in #219
- Fix uneven batches in distributed dataloading by @awaelchli in #237
- feat: add a custom storage options param by @csy1204 in #246
- Fix index errors on world size > 0 by @awaelchli in #252
New Contributors
- @csy1204 made their first contribution in #229
- @weiji14 made their first contribution in #230
- @lud-ds made their first contribution in #239
Full Changelog: v0.2.16...v0.2.17
v0.2.16
What's Changed
- Feat: adds support for reading mosaic mds written dataset by @bhimrazy in #210
- Fix: local path issue in distributed optimize method by @deependujha in #214
- Fix resuming dataset state by @awaelchli in #217
Full Changelog: v0.2.15...v0.2.16