You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user_guide/use_case/custom_dataloaders.md
+52-17Lines changed: 52 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,25 +21,34 @@ Pros:
21
21
- Optimized for ML Workflows: If your dataset is structured as tables (rows and columns), LamindDB’s format aligns well with SCVI's expectations, potentially reducing the need for complex transformations.
LamindDB may not be as efficient or flexible as TileDB for handling complex multi-dimensional data
41
50
42
-
2.[CZI](https://chanzuckerberg.com/) based [tiledb](https://tiledb.com/) custom dataloader is based on CensusSCVIDataModule and can run a large multi-dimensional datasets that are stored in TileDB’s format.
51
+
2.[CZI](https://chanzuckerberg.com/) based [tiledb](https://tiledb.com/) custom dataloader is based on TileDBDataModule and can run a large multi-dimensional datasets that are stored in TileDB’s format.
43
52
44
53
TileDB is a general-purpose, multi-dimensional array storage engine designed for high-performance, scalable data access. It supports various data types, including dense and sparse arrays, and is optimized for handling large datasets efficiently. TileDB’s strength lies in its ability to store and query data across multiple dimensions and scale efficiently with large volumes of data.
45
54
@@ -52,9 +61,10 @@ Scalability: Handles large datasets that exceed your system's memory capacity, m
52
61
```python
53
62
import cellxgene_census
54
63
import tiledbsoma as soma
55
-
from cellxgene_census.experimental.ml importexperiment_dataloader
Key Differences between them in terms of Custom Dataloaders:
@@ -110,6 +143,8 @@ When to Use Each:
110
143
Writing custom dataloaders requires a good understanding of PyTorch’s DataLoader class and how to integrate it with SCVI, which may be difficult for beginners.
111
144
It will also requite maintenance: If the data format or preprocessing needs change, you’ll have to modify and maintain the custom dataloader code, But it can be a greate addition to the model pipeline, in terms of runtime and how much data we can digest.
112
145
146
+
See relevant tutorials in this subject for further examples.
147
+
113
148
:::{note}
114
-
As for SCVI-Tools v1.3.0 Custom Dataloaders are experimental.
149
+
As for SCVI-Tools v1.3.0 Custom Dataloaders are experimental and only supported for adata and SCVI and SCANVI models
0 commit comments