-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"OutOfMemoryError: CUDA out of memory." in GPU mode #372
Comments
I am getting the same error, is there any solution to this issue? Thanks! |
Are you referring to Regression model or Cell2location model?
Regression model should not have any issues with this. You can check
availability of GPU memory with `nvidia-smi` command.
…On Mon, 15 Jul 2024 at 19:24, Ankit Patel ***@***.***> wrote:
I am getting the same error, is there any solution to this issue? Thanks!
—
Reply to this email directly, view it on GitHub
<#372 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFMFTV5JLGT7OK5BY5NQINDZMQHULAVCNFSM6AAAAABJTEDF2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRZGEZDAMZWGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I am getting 'OutOfMemoryError: CUDA out of memory. Tried to allocate 1.85 GiB. GPU ' with Cell2location model. It's actually very weird because it works fine with the same object where I have applied 'median_abs_deviation' filtering on 'log1p_total_counts' on each sample before concatenating it. its only some 1700 spots difference between the two. Do you know why some (outlier) spots causing this error? Thanks @vitkl for your help! |
I got the same problem...I found a large usage of GPU memory by "/usr/lib/rstudio-server/bin/rsession", after ending this process by "kill -9 PID", memory was released. But after running mod.train(max_epochs=30000, batch_size=None, train_size=1), the similar process popped up again and took up ~7000MiB again! I repeated this action for several times, which made me confused...
|
I met the problem while running cell2location model~
GPU available: True (cuda), used: True
|
@avpatel18 You probably need to look into GPU memory settings rather than RAM settings on your cluster. |
|
thanks a lot. here is the install step.
export PYTHONNOUSERSITE="aaaaa"
conda create -y -n cell2location_cuda118_torch22 python=3.10
conda activate cell2location_cuda118_torch22
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
when run mod.train(max_epochs=250)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/aa/miniconda3/envs/cell2location/lib/python3.10/site-packages/lightning/pytorch/trainer/configuration_validator.py:72: You passed in a
val_dataloader
but have novalidation_step
. Skipping val loop.LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/aa/miniconda3/envs/cell2location/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the
num_workers
argumentto
num_workers=23in the
DataLoader` to improve performance.Epoch 1/250: 0%| | 0/250 [00:00<?, ?it/s]
OutOfMemoryError Traceback (most recent call last)
OutOfMemoryError: CUDA out of memory. Tried to allocate 98.00 MiB. GPU
The text was updated successfully, but these errors were encountered: