Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about the GPU needed for training #72

Open
zhuojiuqingyun opened this issue Oct 8, 2024 · 0 comments
Open

questions about the GPU needed for training #72

zhuojiuqingyun opened this issue Oct 8, 2024 · 0 comments

Comments

@zhuojiuqingyun
Copy link

zhuojiuqingyun commented Oct 8, 2024

Dear authors,
Hi, Thank you very much for making this great tool.
I got the following error while trying to run train-saturn.py:
!python3 ../../train-saturn.py --in_data=data/frog_zebrafish_run.csv --in_label_col=cell_type --ref_label_col=cell_type --num_macrogenes=2000 --hv_genes=8000 --centroids_init_path=saturn_results/fz_centroids.pkl --score_adata --ct_map_path=data/frog_zebrafish_cell_type_map.csv --work_dir=. --device_num=0

Global seed set to 0
Using Device 0
Set seed to 0
After loading the anndata frog View of AnnData object with n_obs × n_vars = 96935 × 9538
    obs: 'library', 'clusters', 'dev_stage', 'parent_clusters', 'cell_type', 'n_genes', 'species', 'species_type_label', 'truth_labels', 'ref_labels'
    var: 'n_cells'
After loading the anndata zebrafish View of AnnData object with n_obs × n_vars = 63371 × 16980
    obs: 'n_counts', 'unique_cell_id', 'cell_names', 'library_id', 'batch', 'ClusterID', 'ClusterName', 'TissueID', 'TissueName', 'TimeID', 'cluster', 'cell_type', 'n_genes', 'species', 'species_type_label', 'truth_labels', 'ref_labels'
    var: 'n_cells'
Loaded centroids
Pretraining...
  0%|                                                   | 0/200 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "/home/users/Ruijie/tools/SATURN/Vignettes/frog_zebrafish_embryogenesis/../../train-saturn.py", line 1064, in <module>
    trainer(args)
  File "/home/users/Ruijie/tools/SATURN/Vignettes/frog_zebrafish_embryogenesis/../../train-saturn.py", line 654, in trainer
    pretrain_model = pretrain_saturn(pretrain_model, pretrain_loader, optim_pretrain,
  File "/home/users/Ruijie/tools/SATURN/Vignettes/frog_zebrafish_embryogenesis/../../train-saturn.py", line 215, in pretrain_saturn
    encoder_input, encoded, mus, log_vars, px_rates, px_rs, px_drops = model(data, species)
  File "/home/users/Ruijie/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/users/Ruijie/tools/SATURN/model/saturn_model.py", line 107, in forward
    expr[:, filler_idx[0]:filler_idx[1]] = inp
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Then I checked my GPU and got the following result:

device = torch.device('cuda')
print('E', torch.cuda.get_device_properties(device))
/home/users/Ruijie/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/cuda/__init__.py:120: UserWarning: 
    Found GPU%d %s which is of cuda capability %d.%d.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is %d.%d.
    
  warnings.warn(old_gpu_warn.format(d, name, major, minor, min_arch // 10, min_arch % 10))
E _CudaDeviceProperties(name='GeForce GT 730', major=3, minor=5, total_memory=2000MB, multi_processor_count=2)

Does it mean my GPU is too old and I need to change my GPU?
Other GPUs I can rent are NVIDIA Quadro P4000 and NVIDIA GTX 1080Ti. I wonder whether they are enough for training SATURN.
Or if I buy the colab pro, is T4 enough?
Could you help me with my question?

Best wishes!
Ruijie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant