You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've successfully installed the scvi==1.1.x (main branch) and tested that I can train the model on 1 GPU.
However, when using multi-GPU, here's the error I'm facing at.
Batch_size = 512.
for 1 GPU:
x.shape = (512, 1178)
for 2 GPUs:
x.shape = (1, 512, 1178)
This causes almost everything cannot run in the code due to the dimension mismatch.
For example, one_hot function or FCLayers.
Do you have a quick fix for this or I should manually change everything of the dimension in the code (maybe x.squeeze(0) in the outer-most nn.Module) to match it?
The text was updated successfully, but these errors were encountered:
Hi @martinkim0 scVI is working with multiple samples (like n_samples_per_mc_run). Those look similar like the multi-GPU structure (n_samples, n_batch, n_genes). Quite some other functions like scANVI are not handling n_samples correctly (dimension errors). It's major work to adapt this. I was so confused by it today and made scANVI instead work with n_samples=1.
Hi @zhenxingjian, what model are you using for this? I'll note that we have only tested multi-GPU training on scVI.
I'm following the setup of multiVI.
If you've tested that scVI is working with multi-GPU training, I can try to modify the code following the same setup in scVI from my end to see if it can support multi-GPU training.
I've successfully installed the scvi==1.1.x (main branch) and tested that I can train the model on 1 GPU.
However, when using multi-GPU, here's the error I'm facing at.
Batch_size = 512.
for 1 GPU:
x.shape = (512, 1178)
for 2 GPUs:
x.shape = (1, 512, 1178)
This causes almost everything cannot run in the code due to the dimension mismatch.
For example, one_hot function or FCLayers.
Do you have a quick fix for this or I should manually change everything of the dimension in the code (maybe x.squeeze(0) in the outer-most nn.Module) to match it?
The text was updated successfully, but these errors were encountered: