-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I set it up so I can run it with 24GB ? #15
Comments
You can refer to #10 (comment) |
I am using the flags suggested in #10
But I am still getting the CUDA memory issue for 24GB
I traced and noticed that this issue results from the switch from phase1 to phase2 training once the number of steps reaches phase1_train_steps value. It reaches the CUDA memory limit issue at line What else do you suggest to resolve this? Can you elaborate on your fix or was it just setting the above flags? I have attempted many different configurations for my accelerate environment but cannot bypass this memory issue. Please let me know what else I can try. Thanks for the help! |
in However, that causes an error with texture training here:
I'm able to run float32 on an 11GB VRAM GPU for texture training by adding these lines to guidance.py
I also needed to remove the unused prediction depth values from these lines from
I hope this helps |
i use this set
--gradient_checkpointing \
--mixed_precision fp16
--use_8bit_adam
--set_grads_to_none
but error is
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 23.69 GiB of which 313.69 MiB is free. Including non-PyTorch memory, this process has 23.38 GiB memory in use. Of the allocated memory 22.93 GiB is allocated by PyTorch, and 128.80 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
can you help me ?thank you~
The text was updated successfully, but these errors were encountered: