Skip to content

Explore Optimizing and Running Tests in Parallel for Faster CI in litdata #612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks
bhimrazy opened this issue Jun 4, 2025 · 1 comment
Open
5 tasks
Labels
ci / tests help wanted Extra attention is needed

Comments

@bhimrazy
Copy link
Collaborator

bhimrazy commented Jun 4, 2025

Title: Explore Running Tests in Parallel for Faster CI in litdata

Description:

We should investigate the possibility of running the test suite in parallel to speed up CI and local test execution. As the test base grows, serial execution becomes a bottleneck, especially when contributing frequently or iterating on PRs.

🧪 An initial test was conducted in #608.

Tasks:

  • Review the current test setup and determine if there are any global state dependencies or side effects.
  • Look into test cases optimizations
  • Explore using [pytest-xdist](https://pypi.org/project/pytest-xdist/) or an alternative for parallelization.
  • Identify and resolve any tests that may fail due to parallel execution.
  • Benchmark before/after test runtimes.

Goal:

Improve test speed while maintaining test accuracy and reproducibility.

@bhimrazy bhimrazy changed the title Explore Running Tests in Parallel for Faster CI in litdata Explore Optimizing and Running Tests in Parallel for Faster CI in litdata Jun 4, 2025
@bhimrazy
Copy link
Collaborator Author

bhimrazy commented Jun 8, 2025

Slowest Test runs:

pytester (ubuntu-22.04, 3.11)

============================ slowest 100 durations =============================
195.33s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_with_large_num_chunks
60.00s teardown tests/streaming/test_dataloader.py::test_custom_collate_multiworker
43.44s call     tests/processing/test_functions.py::test_optimize_append_overwrite
40.00s teardown tests/streaming/test_dataset.py::test_resumable_dataset_two_workers_2_epochs
40.00s teardown tests/streaming/test_dataloader.py::test_dataloader_states_with_persistent_workers
36.46s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even_multi_nodes[zstd-True]
36.45s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even_multi_nodes[zstd-False]
36.14s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even[zstd-True]
36.14s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even[zstd-False]
33.47s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_odd[zstd-False]
33.47s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_odd[zstd-True]
28.51s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-5]
28.43s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-75]
28.43s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-1200]
25.24s call     tests/processing/test_functions.py::test_optimize_with_fernet_encryption
22.45s call     tests/streaming/test_cache.py::test_cache_for_image_dataset_distributed[2]
20.00s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-5]
20.00s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-1200]
20.00s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-75]
20.00s teardown tests/processing/test_functions.py::test_map_with_text_files[False]
20.00s teardown tests/processing/test_functions.py::test_optimize_with_text_files[False]
20.00s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-64MB-None]
19.79s call     tests/streaming/test_dataset.py::test_dataset_resume_on_future_chunks[True]
19.22s call     tests/streaming/test_dataset.py::test_dataset_resume_on_future_chunks[False]
18.67s call     tests/streaming/test_dataloader.py::test_resume_parallel_dataset[simple_transform-2-None]
17.47s call     tests/processing/test_functions.py::test_optimize_with_rsa_encryption
17.34s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_distributed_num_workers_end_to_end
16.90s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[4-15]
14.74s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-75]
14.74s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-64MB-None]
14.74s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-1200]
14.73s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-5]
14.70s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-64MB-None]
13.03s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_2_epochs_int_length
12.97s call     tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_partial_iterations[None-3-2]
12.96s call     tests/streaming/test_dataset.py::test_dataset_with_mosaic_mds_data
12.71s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_2_epochs_none_length
12.68s call     tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_partial_iterations[48-3-2]
12.38s call     tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_partial_iterations[20-3-2]
12.18s call     tests/processing/test_functions.py::test_map_with_path
12.14s call     tests/streaming/test_combined.py::test_combined_dataset_with_dataloader_2_epochs
11.86s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[4-10]
11.53s call     tests/processing/test_data_processor.py::test_data_processsor_nlp
11.33s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[False-random-2-None]
11.33s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[False-torch-2-7]
11.26s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[False-torch-2-None]
11.11s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[True-random-2-None]
11.08s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[True-numpy-2-None]
11.07s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[False-numpy-2-7]
11.01s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[False-random-2-7]
11.00s call     tests/streaming/test_dataset.py::test_subsample_streaming_dataset_with_token_loader
10.99s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[True-numpy-2-7]
10.96s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[True-torch-2-7]
10.96s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[False-numpy-2-None]
10.94s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[True-random-2-7]
10.94s call     tests/streaming/test_parallel.py::test_parallel_dataset_rng[True-torch-2-None]
10.41s call     tests/streaming/test_dataloader.py::test_dataloader_with_loading_states
8.72s call     tests/processing/test_data_processor.py::test_data_processsor[10-True]
8.69s call     tests/streaming/test_dataloader.py::test_resume_parallel_dataset[rng_transform-2-None]
8.64s call     tests/streaming/test_dataloader.py::test_resume_parallel_dataset[rng_transform-2-7]
8.61s call     tests/streaming/test_dataloader.py::test_resume_parallel_dataset[simple_transform-2-7]
8.55s call     tests/streaming/test_dataloader.py::test_resume_parallel_dataset[None-2-7]
8.52s call     tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_complete_iterations[24-2]
8.50s call     tests/streaming/test_dataloader.py::test_resume_parallel_dataset[None-2-None]
8.47s call     tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_complete_iterations[None-2]
8.40s call     tests/processing/test_functions.py::test_optimize_with_queues_as_input[2]
8.34s call     tests/streaming/test_dataset.py::test_resumable_dataset_two_workers_2_epochs
8.08s call     tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_partial_iterations[20-7-2]
7.93s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_complete_iterations[4]
7.86s call     tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_partial_iterations[48-7-2]
7.85s call     tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_partial_iterations[None-7-2]
7.64s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_with_large_block_size_multiple_workers
7.50s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[2-15]
7.47s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[2-10]
7.23s call     tests/utilities/test_train_test_split.py::test_train_test_split_with_streaming_dataloader[zstd]
7.16s call     tests/processing/test_functions.py::test_optimize_race_condition
6.84s call     tests/streaming/test_dataset.py::test_dataset_reshuffling_every_epoch
6.73s call     tests/processing/test_readers.py::test_reader
6.72s call     tests/processing/test_data_processor.py::test_data_processing_optimize
6.70s call     tests/processing/test_data_processor.py::test_map_is_last[2-expected1]
6.68s call     tests/processing/test_functions.py::test_optimize_with_text_files[False]
6.68s call     tests/processing/test_data_processor.py::test_data_processing_optimize_class_yield
6.66s call     tests/processing/test_data_processor.py::test_data_processing_map
6.65s call     tests/processing/test_data_processor.py::test_data_process_transform
6.65s call     tests/processing/test_data_processor.py::test_data_processing_optimize_class
6.65s call     tests/processing/test_functions.py::test_map_with_text_files[True]
6.63s call     tests/processing/test_functions.py::test_optimize_with_text_files[True]
6.62s call     tests/processing/test_functions.py::test_map_with_text_files[False]
6.28s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_no_shuffle[zstd-False]
6.07s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_no_shuffle[zstd-True]
6.00s call     tests/processing/test_functions.py::test_optimize_with_queues_as_input[1]
5.95s call     tests/streaming/test_dataloader.py::test_resume_dataloader_with_new_dataset
5.60s call     tests/processing/test_functions.py::test_optimize_with_jpeg_array
5.55s call     tests/streaming/test_dataloader.py::test_resume_dataloader_after_some_workers_are_done
5.47s call     tests/streaming/test_reader.py::test_reader_chunk_removal
5.47s call     tests/streaming/test_reader.py::test_reader_chunk_removal_compressed
5.16s call     tests/processing/test_data_processor.py::test_map_batch_size
5.13s call     tests/processing/test_data_processor.py::test_empty_optimize[inputs0]
5.12s call     tests/processing/test_data_processor.py::test_data_processing_map_without_input_dir_and_folder
5.12s call     tests/processing/test_data_processor.py::test_map_is_last[1-expected0]
========== 389 passed, 10 skipped, 19 warnings in 1826.10s (0:30:26) ===========

pytester (macos-14, 3.11)

============================ slowest 100 durations =============================
345.54s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_with_large_num_chunks
69.88s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[4-10]
68.63s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[4-15]
60.26s teardown tests/streaming/test_dataloader.py::test_custom_collate_multiworker
57.23s call     tests/streaming/test_combined.py::test_combined_dataset_with_dataloader_2_epochs
56.89s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even_multi_nodes[zstd-False]
56.40s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even_multi_nodes[zstd-True]
49.28s call     tests/streaming/test_dataloader.py::test_dataloader_with_loading_states
46.51s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_odd[zstd-False]
46.47s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_odd[zstd-True]
45.77s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_complete_iterations[4]
44.36s call     tests/streaming/test_cache.py::test_cache_for_image_dataset_distributed[2]
40.27s teardown tests/streaming/test_dataset.py::test_resumable_dataset_two_workers_2_epochs
40.12s teardown tests/streaming/test_dataloader.py::test_dataloader_states_with_persistent_workers
38.81s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_distributed_num_workers_end_to_end
37.23s call     tests/processing/test_functions.py::test_optimize_append_overwrite
35.67s call     tests/streaming/test_dataset.py::test_dataset_resume_on_future_chunks[True]
35.52s call     tests/streaming/test_dataset.py::test_dataset_resume_on_future_chunks[False]
34.97s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[2-15]
34.91s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[2-10]
30.92s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-1200]
30.25s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-75]
29.69s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-64MB-None]
29.04s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-5]
28.96s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_with_large_block_size_multiple_workers
26.34s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-64MB-None]
25.90s call     tests/streaming/test_dataloader.py::test_resume_dataloader_with_new_dataset
24.53s call     tests/processing/test_functions.py::test_optimize_with_fernet_encryption
24.29s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_complete_iterations[2]
21.50s call     tests/streaming/test_dataset.py::test_dataset_reshuffling_every_epoch
20.11s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-64MB-None]
20.11s teardown tests/processing/test_functions.py::test_optimize_with_text_files[False]
20.10s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-1200]
20.09s teardown tests/processing/test_functions.py::test_map_with_text_files[False]
20.09s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-75]
20.01s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-5]
19.05s call     tests/streaming/test_dataloader.py::test_resume_dataloader_after_some_workers_are_done
18.82s call     tests/streaming/test_dataset.py::test_dataset_valid_state_override
17.89s call     tests/processing/test_functions.py::test_optimize_with_rsa_encryption
16.34s call     tests/streaming/test_dataset.py::test_resumable_dataset_two_workers_2_epochs
15.59s call     tests/streaming/test_dataloader.py::test_dataloader_no_workers
14.74s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_multiple_workers
13.98s call     tests/streaming/test_dataloader.py::test_dataloader_states_with_persistent_workers
13.61s call     tests/streaming/test_combined.py::test_combined_dataset_with_per_stream_batching[2-4]
13.29s call     tests/streaming/test_combined.py::test_combined_dataset_with_per_stream_batching[2-2]
12.74s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-5]
12.42s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-1200]
12.23s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-75]
11.19s call     tests/utilities/test_train_test_split.py::test_train_test_split_with_streaming_dataloader[zstd]
10.85s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_no_shuffle[zstd-True]
10.12s call     tests/processing/test_data_processor.py::test_data_processsor_distributed[False-False]
10.04s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_no_shuffle[zstd-False]
9.84s call     tests/streaming/test_reader.py::test_reader_chunk_removal
9.43s call     tests/processing/test_functions.py::test_map_with_path
9.34s call     tests/streaming/test_dataset.py::test_dataset_valid_state
8.92s call     tests/processing/test_data_processor.py::test_data_processsor_nlp
8.76s call     tests/streaming/test_dataset.py::test_subsample_streaming_dataset_with_token_loader
8.46s call     tests/streaming/test_reader.py::test_reader_chunk_removal_compressed
8.14s call     tests/streaming/test_combined.py::test_combined_dataset_with_per_stream_batching[1-2]
8.02s call     tests/processing/test_functions.py::test_optimize_with_queues_as_input[2]
8.00s call     tests/processing/test_readers.py::test_parquet_reader
7.99s call     tests/streaming/test_combined.py::test_combined_dataset_with_per_stream_batching[1-4]
7.76s call     tests/streaming/test_dataset.py::test_streaming_dataset_deepcopy
7.60s call     tests/processing/test_data_processor.py::test_data_processing_optimize_class
7.54s call     tests/processing/test_data_processor.py::test_data_processsor[10-True]
7.46s call     tests/processing/test_functions.py::test_optimize_with_queues_as_input[1]
7.30s call     tests/processing/test_functions.py::test_map_with_text_files[False]
6.92s call     tests/processing/test_functions.py::test_optimize_race_condition
6.89s call     tests/streaming/test_combined.py::test_combined_dataset
6.88s call     tests/streaming/test_combined.py::test_combined_dataset_with_dataloader_and_one_worker[2]
6.83s call     tests/streaming/test_combined.py::test_combined_dataset_with_dataloader_and_one_worker[1]
6.67s call     tests/processing/test_data_processor.py::test_map_is_last[2-expected1]
6.59s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_and_one_worker[2-None-expected1-num_samples_yielded1-num_cycles1]
6.49s call     tests/processing/test_functions.py::test_optimize_with_text_files[True]
6.44s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_and_one_worker[1-None-expected0-num_samples_yielded0-num_cycles0]
6.36s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_and_one_worker[2-13-expected3-num_samples_yielded3-num_cycles3]
6.35s call     tests/processing/test_data_processor.py::test_data_processing_optimize_class_yield
6.29s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_and_one_worker[1-13-expected2-num_samples_yielded2-num_cycles2]
5.84s call     tests/processing/test_readers.py::test_reader
5.78s call     tests/processing/test_data_processor.py::test_data_processing_map
5.66s call     tests/processing/test_data_processor.py::test_data_processing_optimize
5.48s call     tests/processing/test_functions.py::test_optimize_with_text_files[False]
5.40s call     tests/streaming/test_dataset.py::test_streaming_dataset_max_cache_dir
5.27s call     tests/processing/test_data_processor.py::test_data_processing_map_without_input_dir_remote
5.12s call     tests/processing/test_data_processor.py::test_data_processing_map_without_input_dir_local
5.11s call     tests/streaming/test_dataset.py::test_streaming_dataset[zstd]
5.01s call     tests/processing/test_data_processor.py::test_map_is_last[1-expected0]
5.00s call     tests/processing/test_data_processor.py::test_data_processing_map_non_absolute_path
4.85s call     tests/processing/test_data_processor.py::test_data_processing_optimize_yield
4.80s call     tests/processing/test_functions.py::test_optimize_with_jpeg_array
4.77s call     tests/streaming/test_parquet.py::test_stream_hf_parquet_dataset[True-False]
4.69s call     tests/processing/test_data_processor.py::test_empty_optimize[inputs2]
4.66s call     tests/processing/test_data_processor.py::test_data_process_transform
4.64s call     tests/processing/test_functions.py::test_map_with_text_files[True]
4.59s call     tests/processing/test_data_processor.py::test_data_processing_map_without_input_dir_and_folder
4.42s call     tests/processing/test_data_processor.py::test_empty_optimize[inputs0]
4.33s call     tests/processing/test_data_processor.py::test_map_batch_size
4.21s call     tests/streaming/test_dataset.py::test_dataset_with_mosaic_mds_data
4.20s call     tests/processing/test_data_processor.py::test_empty_optimize[inputs1]
4.17s call     tests/streaming/test_parquet.py::test_stream_hf_parquet_dataset[True-True]
========== 333 passed, 66 skipped, 23 warnings in 2295.58s (0:38:15) ===========

pytester (windows-2022, 3.11)

============================ slowest 100 durations ============================
60.03s teardown tests/streaming/test_dataloader.py::test_custom_collate_multiworker
40.02s teardown tests/streaming/test_dataloader.py::test_dataloader_states_with_persistent_workers
38.20s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even_multi_nodes[zstd-False]
38.14s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even_multi_nodes[zstd-True]
37.37s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even[zstd-False]
37.06s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even[zstd-True]
33.90s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_odd[zstd-True]
33.88s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_odd[zstd-False]
30.37s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-5]
30.02s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-1200]
29.85s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-None-75]
23.28s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_distributed_num_workers_end_to_end
20.01s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-75]
20.01s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-5]
20.01s teardown tests/processing/test_functions.py::test_map_with_text_files[False]
20.00s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-None-1200]
20.00s teardown tests/streaming/test_dataset.py::test_optimize_dataset[False-64MB-None]
20.00s teardown tests/processing/test_functions.py::test_optimize_with_text_files[False]
19.50s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-5]
19.23s call     tests/streaming/test_dataset.py::test_optimize_dataset[False-64MB-None]
19.20s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-64MB-None]
18.81s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-1200]
18.66s call     tests/streaming/test_dataset.py::test_optimize_dataset[True-None-75]
16.59s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[4-15]
16.51s call     tests/processing/test_functions.py::test_map_with_path
16.28s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[4-10]
14.60s call     tests/streaming/test_dataloader.py::test_dataloader_with_loading_states
14.53s call     tests/streaming/test_dataset.py::test_subsample_streaming_dataset_with_token_loader
11.03s call     tests/processing/test_functions.py::test_optimize_with_queues_as_input[2]
10.99s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_complete_iterations[4]
10.72s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[2-10]
10.66s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_partial_iterations[2-15]
10.41s call     tests/processing/test_readers.py::test_parquet_reader
10.00s setup    tests/streaming/test_dataloader.py::test_resume_parallel_dataset[rng_transform-2-None]
9.80s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_with_large_block_size_multiple_workers
9.52s call     tests/processing/test_readers.py::test_reader
9.20s call     tests/processing/test_functions.py::test_map_with_text_files[True]
8.98s call     tests/processing/test_functions.py::test_map_with_text_files[False]
8.60s call     tests/processing/test_functions.py::test_optimize_with_text_files[False]
8.50s call     tests/processing/test_functions.py::test_optimize_with_text_files[True]
8.50s call     tests/processing/test_functions.py::test_optimize_with_queues_as_input[1]
8.42s call     tests/streaming/test_dataloader.py::test_resume_dataloader_with_new_dataset
7.68s call     tests/streaming/test_dataloader.py::test_resume_dataloader_after_some_workers_are_done
7.62s call     tests/processing/test_functions.py::test_optimize_with_jpeg_array
7.60s call     tests/utilities/test_train_test_split.py::test_train_test_split_with_streaming_dataloader[zstd]
7.47s call     tests/processing/test_data_processor.py::test_map_batch_size
7.45s call     tests/processing/test_data_processor.py::test_empty_optimize[inputs0]
7.32s call     tests/processing/test_data_processor.py::test_empty_optimize[inputs2]
7.24s call     tests/processing/test_data_processor.py::test_empty_optimize[inputs1]
7.20s call     tests/streaming/test_dataloader.py::test_dataloader_states_with_persistent_workers
7.16s call     tests/streaming/test_combined.py::test_combined_dataset_dataloader_states_complete_iterations[2]
6.57s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_no_shuffle[zstd-False]
6.38s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_no_shuffle[zstd-True]
5.72s call     tests/streaming/test_reader.py::test_reader_chunk_removal
5.59s call     tests/streaming/test_reader.py::test_reader_chunk_removal_compressed
5.00s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_multiple_workers
4.89s call     tests/streaming/test_combined.py::test_combined_dataset_with_per_stream_batching[2-2]
4.82s call     tests/streaming/test_combined.py::test_combined_dataset_with_per_stream_batching[2-4]
4.55s call     tests/streaming/test_dataset.py::test_streaming_dataset_max_cache_dir
4.36s call     tests/streaming/test_combined.py::test_combined_dataset_with_per_stream_batching[1-2]
4.31s call     tests/streaming/test_combined.py::test_combined_dataset_with_per_stream_batching[1-4]
3.93s call     tests/streaming/test_dataloader.py::test_custom_collate_multiworker
3.85s call     tests/streaming/test_dataset.py::test_streaming_dataset_deepcopy
3.66s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_and_one_worker[1-None-expected0-num_samples_yielded0-num_cycles0]
3.66s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_and_one_worker[1-13-expected2-num_samples_yielded2-num_cycles2]
3.64s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_and_one_worker[2-None-expected1-num_samples_yielded1-num_cycles1]
3.63s call     tests/streaming/test_dataset.py::test_streaming_dataset[zstd]
3.49s call     tests/streaming/test_parallel.py::test_parallel_dataset_with_dataloader_and_one_worker[2-13-expected3-num_samples_yielded3-num_cycles3]
3.40s call     tests/streaming/test_combined.py::test_combined_dataset
3.26s call     tests/streaming/test_combined.py::test_combined_dataset_with_dataloader_and_one_worker[1]
3.23s call     tests/streaming/test_combined.py::test_combined_dataset_with_dataloader_and_one_worker[2]
2.95s call     tests/streaming/test_parquet.py::test_stream_hf_parquet_dataset[False-False]
2.94s call     tests/streaming/test_parquet.py::test_stream_hf_parquet_dataset[False-True]
2.93s call     tests/streaming/test_parquet.py::test_stream_hf_parquet_dataset[True-True]
2.92s call     tests/streaming/test_parquet.py::test_stream_hf_parquet_dataset[True-False]
2.20s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even_multi_nodes[None-True]
2.20s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even_multi_nodes[None-False]
1.92s call     tests/streaming/test_sampler.py::test_batch_sampler_imagenet
1.71s call     tests/streaming/test_dataset.py::test_dataset_as_iterator_and_non_iterator[True-True]
1.70s call     tests/streaming/test_dataset.py::test_dataset_as_iterator_and_non_iterator[False-True]
1.66s call     tests/streaming/test_parquet.py::test_cache_dir_option[False]
1.66s call     tests/streaming/test_parquet.py::test_cache_dir_option[True]
1.57s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens_distributed_num_workers
1.55s call     tests/streaming/test_dataset.py::test_streaming_dataset[None]
1.53s call     tests/streaming/test_dataset.py::test_dataset_cache_recreation
1.51s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even[None-False]
1.51s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_odd[None-False]
1.50s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_even[None-True]
1.45s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_full_shuffle_odd[None-True]
1.35s call     tests/streaming/test_dataset.py::test_dataset_for_text_tokens
1.30s call     tests/utilities/test_train_test_split.py::test_split_a_subsampled_dataset[None]
1.30s call     tests/utilities/test_train_test_split.py::test_split_a_subsampled_dataset[zstd]
1.19s call     tests/streaming/test_dataloader.py::test_dataloader_no_workers
1.15s call     tests/utilities/test_encryption.py::test_fernet_encryption
1.14s call     tests/streaming/test_reader.py::test_prepare_chunks_thread_eviction
1.09s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_no_shuffle[None-False]
1.09s call     tests/streaming/test_dataset.py::test_streaming_dataset_distributed_no_shuffle[None-True]
1.09s call     tests/streaming/test_parallel.py::test_dataloader_shuffle[True]
1.08s setup    tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_without_any_iterations[None]
1.08s setup    tests/streaming/test_parallel.py::test_parallel_dataset_dataloader_states_without_any_iterations[3]
========= 264 passed, 135 skipped, 20 warnings in 1112.93s (0:18:32) ==========

@bhimrazy bhimrazy added the help wanted Extra attention is needed label Jun 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci / tests help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant