Skip to content

Fix custom dataloader registry #2907

Closed
@canergen

Description

@canergen

CustomDataloaders currently don't support advanced capabilities like scArches or celltype prediction in scANVI. We have to create a registry without setup_anndata that contains the same elements (see below).
https://github.com/chanzuckerberg/cellxgene-census/blob/222efddf2ce82f93f76329aa353962c1dc2400ac/api/python/notebooks/experimental/pytorch_loader_scvi.ipynb is the first working example. Currently, they use the following code to save the model:

user_attributes = model._get_user_attributes()
user_attributes = {a[0]: a[1] for a in user_attributes if a[0][-1] == "_"}

user_attributes.update(
    {
        "n_batch": datamodule.n_batch,
        "n_extra_categorical_covs": 0,
        "n_extra_continuous_covs": 0,
        "n_labels": 1,
        "n_vars": datamodule.n_vars,
    }
)

We want to create a new function that fills out the registry and passes it to the model at: model = scvi.model.SCVI(n_layers=n_layers, n_latent=n_latent, gene_likelihood="nb", encode_covariates=False). You can see all necessary entries and the structure at: scvi.adata_manager.get_state_registry(scvi.REGISTRY_KEYS.X_KEY).to_dict().
After fixing this, all uses of _module_init_on_train throughout the codebase should be removed as they are not necessary anymore.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions