feat(dataloaders): Custom dataloader registry support #2932

ori-kron-wis · 2024-08-07T12:57:55Z

No description provided.

…try' into ori-2907-custom-dataloader-registry

…module / registry big change

for more information, see https://pre-commit.ci

…un, we will later adjust this file

codecov · 2024-08-11T11:53:18Z

Codecov Report

❌ Patch coverage is 81.67614% with 129 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.16%. Comparing base (ced87df) to head (67caa96).
⚠️ Report is 70 commits behind head on main.

Files with missing lines	Patch %	Lines
src/scvi/model/base/_base_model.py	49.63%	69 Missing ⚠️
src/scvi/dataloaders/_custom_dataloders.py	91.59%	29 Missing ⚠️
src/scvi/model/base/_archesmixin.py	82.22%	8 Missing ⚠️
src/scvi/model/base/_training_mixin.py	76.47%	8 Missing ⚠️
src/scvi/model/_scanvi.py	86.66%	4 Missing ⚠️
src/scvi/model/base/_rnamixin.py	93.33%	4 Missing ⚠️
src/scvi/model/base/_vaemixin.py	77.77%	4 Missing ⚠️
src/scvi/data/_utils.py	57.14%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2932      +/-   ##
==========================================
+ Coverage   80.12%   80.16%   +0.04%     
==========================================
  Files         196      197       +1     
  Lines       17570    18156     +586     
==========================================
+ Hits        14078    14555     +477     
- Misses       3492     3601     +109

Files with missing lines	Coverage Δ
src/scvi/dataloaders/__init__.py	`100.00% <100.00%> (ø)`
src/scvi/dataloaders/_data_splitting.py	`95.47% <ø> (ø)`
src/scvi/model/_scvi.py	`96.42% <100.00%> (+0.51%)`	⬆️
src/scvi/model/base/_save_load.py	`83.49% <100.00%> (+1.38%)`	⬆️
src/scvi/train/_trainingplans.py	`85.73% <100.00%> (+0.41%)`	⬆️
src/scvi/data/_utils.py	`85.00% <57.14%> (-1.13%)`	⬇️
src/scvi/model/_scanvi.py	`91.17% <86.66%> (-1.85%)`	⬇️
src/scvi/model/base/_rnamixin.py	`94.17% <93.33%> (-0.36%)`	⬇️
src/scvi/model/base/_vaemixin.py	`89.13% <77.77%> (+1.17%)`	⬆️
src/scvi/model/base/_archesmixin.py	`78.20% <82.22%> (+1.31%)`	⬆️
... and 3 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…action

and fix the test for custom dataloaders

…try' into ori-2907-custom-dataloader-registry

marianogabitto · 2025-04-25T06:07:48Z

Ori, this is not working for me. When I invoke in the notebook:
training_dataloader = (
datamodule.on_before_batch_transfer(batch, None) for batch in datamodule.train_dataloader()
)
I get:
switching torch multiprocessing start method from "fork" to "spawn"
and then errors out....

marianogabitto · 2025-04-25T06:16:46Z

Ori, all the examples that I am listing below are run by removing the code ".on_before_batch_transfer()". The way I posted before.

When num_workers=0, I can train with low speeds, defined as below. When I fix number of workers, like num_workers=4,12 or 24. The trainer takes forever to initialize and then is even slower than below.
Can you monitor your GPU usage with nvitop or nvtop? Let me tell you my head-to-head comparisons.

TIleDB from cell census. I believe that this is reading from S3, so it is never actually copy data to disk.
It takes 120 sec/it to train. I see GPU activity almost zero all the time except at moments when it picks to 100%.
TIleDB from anndata created from the query . This is reading from a local disk directory.
It takes 11 sec/it to train. I see GPU activity almost zero all the time except at moments when it picks to 100%.
Regular way of loading anndata into memory. It takes 1.2 sec/it to train. I see GPU activity at 40% all the time.

These led me to believe that we are not loading data into GPU memory fast enough.

I forgot to tell you but the TileDB representative send me this as reference. It is different from the way we run because they launch the processes.
https://github.com/single-cell-data/TileDB-SOMA-ML/tree/rw/cli/src/tiledbsoma_ml/cli#example-invocation

…oader-registry

ori-kron-wis · 2025-04-29T15:26:00Z

Hi @marianogabitto ,
Thanks

I made several changes, and I added the on_before_batch_transform into the class, it is not part of the analysis code now. So if you pulled the branch and reinstalled, you will get errors for running the same code as before.
I have updated the tutorials (see there), sorry for this.

But im not following on your code, can you share what you are running exactly so we can compare?

Regarding the GPU behavior. I see it the same.
The data you use matters (the speed enhancement is seen in larger data, not smaller ones).

I dont think GPU is not utilized, it just that the data load is much slower in tiledb, as you said 100 times slower in that sense. so while with adata the data loading is 1s we see almost continuous use of the GPU and in the tileDB s3 there's a 100sec gap between the same GPU usage, so we mistakenly see it underuse.

Num workers is for multiprocessing loading and is a parameter in the torch dataloader. We know that it is also dependent on data size and that we do not always get what we expect from it, specifically, there is overhead with initializing and closing it.
How do you use it? I will try to check its speed in the custom dataloder context also, in any case, we should run it with the number that best benefits us, it's not a magic thing that helps each time.
I think the common practice is a 1 GPU running on a notebook. We need to make sure this is working, and other scenarios will follow.
But having said that, the scripts of running with DDP compared to running them in notebooks can be very different, and we need to test all possibilities. We might find that we need to run it as a script like this reference you gave. Will check.
I added SCANVI to the tutorials as well, some issues still exists in the prediction part for tiledb

marianogabitto · 2025-04-30T22:19:44Z

Ori,
I am testing updates in 12 hours. Sorry for the delay.
One more thing in the meantime. It will be great to expose the scvi Anndata DataLoader as an example of what is going on internally. This code does not work because the BatchDistributedSampler is not outputting the samples with the correct dimensions (In DDP), but if you help me solve it, it will be great.

Code

from scvi.dataloaders import DataSplitter

scvi.model.SCVI.setup_anndata(adata, batch_key="batch", categorical_covariate_keys=['cell_type', 'donor'])
ad_manager = scvi.model.SCVI._get_most_recent_anndata_manager(adata, required=True)

model = scvi.model.SCVI(
registry=ad_manager._registry,
gene_likelihood="nb",
encode_covariates=False,
)

ad_manager.adata = adata
dl = DataSplitter(ad_manager, train_size=0.9, pin_memory=True, num_workers=2, persistent_workers=True)#, prefetch_factor=2)
dl.setup()

model.train(
datamodule=datamodule,
max_epochs=10,
batch_size=128,
train_size=0.9,
early_stopping=False,
accelerator="gpu", devices=-1, strategy="ddp_find_unused_parameters_true",
)

…oader-registry

ori-kron-wis · 2025-05-12T12:56:46Z

Hi @marianogabitto ,
Your code above should work in multiGPU settings, just add distributed_sampler=True to the DataSplitter call

Besides that I made several other updates for this PR, census/lamin custom dataloaders should be working now for scvi/scnavi/scarches/load/save/multiGPU/covariates integration

#3318) Backport PR #2932: Custom dataloader registry support Co-authored-by: Ori Kronfeld <[email protected]>

ori-kron-wis and others added 16 commits July 28, 2024 17:04

copying CZI custom dataloader into our repo

7088e4b

added some fixes to the custom dataloader stuff

cc72b05

Some suggestions

46048e3

Changes to datamodule pipeline

14f343d

Fixed attr_dict

17282cd

added some fixes based on custom data loader test

a4143f5

Changes to dataloader

69abc47

copying CZI custom dataloader into our repo

dc21a3d

added some fixes to the custom dataloader stuff

a1098b3

Some suggestions

b07216b

Changes to datamodule pipeline

a578af1

Fixed attr_dict

42434ec

added some fixes based on custom data loader test

3d0c890

Changes to dataloader

eff5b1e

Merge remote-tracking branch 'origin/ori-2907-custom-dataloader-regis…

cbdc26e

…try' into ori-2907-custom-dataloader-registry

add changes to tests and some merging with main following custom data…

18d65a6

…module / registry big change

ori-kron-wis added this to the scvi-tools 1.2 milestone Aug 7, 2024

ori-kron-wis self-assigned this Aug 7, 2024

ori-kron-wis linked an issue Aug 7, 2024 that may be closed by this pull request

Fix custom dataloader registry #2907

Closed

pre-commit-ci bot and others added 5 commits August 7, 2024 12:58

[pre-commit.ci] auto fixes from pre-commit.com hooks

4fe3ee1

for more information, see https://pre-commit.ci

just put the cutom dataloder2 test under remarks so hook tests will r…

1110966

…un, we will later adjust this file

fixes

7972bdc

additional external models fixes once there is a registry

2d86c43

fixed a few failed tests

3c44d86

ori-kron-wis added 4 commits August 11, 2024 19:03

fix archesmixin init and added new custom dataloader test and github …

c0889d8

…action

fix again for from __future__ import annotations

8fe043c

and fix the test for custom dataloaders

fix for run custom dataloader in github action

d8cf0f6

rollback

c41e8b2

ori-kron-wis added the custom_dataloader PR 2932 label Aug 11, 2024

ori-kron-wis added 2 commits April 24, 2025 14:41

Merge branch 'main' into ori-2907-custom-dataloader-registry

2707478

Merge remote-tracking branch 'origin/ori-2907-custom-dataloader-regis…

d80ff0f

…try' into ori-2907-custom-dataloader-registry

ori-kron-wis added 3 commits April 29, 2025 09:48

Merge remote-tracking branch 'origin/main' into ori-2907-custom-datal…

3510e54

…oader-registry

Merge remote-tracking branch 'origin/main' into ori-2907-custom-datal…

28d29e8

…oader-registry

updates, fix in scanvi

fa3703b

ori-kron-wis added 2 commits April 29, 2025 21:27

updates, fix in scanvi

25b442b

updates, fix in scanvi

439fb42

ori-kron-wis added 8 commits May 5, 2025 14:06

updates, fix in scanvi save\load

7def99d

updates, fix in scanvi

9175bdc

covariates support

6d43866

covariates support

affb079

Merge remote-tracking branch 'origin/main' into ori-2907-custom-datal…

93bcb98

…oader-registry

correct labels order

289473e

Merge remote-tracking branch 'origin/main' into ori-2907-custom-datal…

50eb858

…oader-registry

updates

f8c5e14

ori-kron-wis added 4 commits May 12, 2025 17:52

Update _data_splitting.py

8ae19f0

Merge branch 'main' into ori-2907-custom-dataloader-registry

39ece12

updates with main

3ff5320

updates with main

67caa96

ori-kron-wis merged commit c4cab3b into main May 13, 2025
17 of 18 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/scvi-tools that referenced this pull request May 13, 2025

Backport PR scverse#2932: Custom dataloader registry support

202f8b4

meeseeksmachine mentioned this pull request May 13, 2025

Backport PR #2932 on branch 1.3.x (feat: Custom dataloader registry support) #3318

Merged

ori-kron-wis added a commit that referenced this pull request May 13, 2025

Backport PR #2932 on branch 1.3.x (Custom dataloader registry support) (

d80550f

#3318) Backport PR #2932: Custom dataloader registry support Co-authored-by: Ori Kronfeld <[email protected]>

ori-kron-wis changed the title ~~Custom dataloader registry support~~ feat(dataloaders): Custom dataloader registry support May 20, 2025

moinfar mentioned this pull request May 23, 2025

Support instantiation with 'adata=None' + LightningDataModule for training theislab/DRVI#30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(dataloaders): Custom dataloader registry support #2932

feat(dataloaders): Custom dataloader registry support #2932

Uh oh!

ori-kron-wis commented Aug 7, 2024

Uh oh!

codecov bot commented Aug 11, 2024 •

edited

Loading

Uh oh!

marianogabitto commented Apr 25, 2025

Uh oh!

marianogabitto commented Apr 25, 2025 •

edited

Loading

Uh oh!

ori-kron-wis commented Apr 29, 2025 •

edited

Loading

Uh oh!

marianogabitto commented Apr 30, 2025

Uh oh!

ori-kron-wis commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat(dataloaders): Custom dataloader registry support #2932

feat(dataloaders): Custom dataloader registry support #2932

Uh oh!

Conversation

ori-kron-wis commented Aug 7, 2024

Uh oh!

codecov bot commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

marianogabitto commented Apr 25, 2025

Uh oh!

marianogabitto commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ori-kron-wis commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marianogabitto commented Apr 30, 2025

Code

Uh oh!

ori-kron-wis commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Aug 11, 2024 •

edited

Loading

marianogabitto commented Apr 25, 2025 •

edited

Loading

ori-kron-wis commented Apr 29, 2025 •

edited

Loading