How do I set it up so I can run it with 24GB ？ #15

jim-1ee · 2024-11-28T03:16:56Z

i use this set

--gradient_checkpointing \

--mixed_precision fp16
--use_8bit_adam
--set_grads_to_none
but error is
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 23.69 GiB of which 313.69 MiB is free. Including non-PyTorch memory, this process has 23.38 GiB memory in use. Of the allocated memory 22.93 GiB is allocated by PyTorch, and 128.80 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

can you help me ？thank you~

xdobetter · 2024-12-03T05:53:52Z

You can refer to #10 (comment)

marcusrdlee · 2024-12-10T05:43:04Z

I am using the flags suggested in #10

  # --gradient_checkpointing \
   --mixed_precision fp16 \
   --use_8bit_adam \
   --set_grads_to_none \

But I am still getting the CUDA memory issue for 24GB

return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB. GPU 0 has a total capacity of 23.58 GiB of which 199.38 MiB is free. Including non-PyTorch memory, this process has 22.31 GiB memory in use. Of the allocated memory 21.89 GiB is allocated by PyTorch, and 101.58 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I traced and noticed that this issue results from the switch from phase1 to phase2 training once the number of steps reaches phase1_train_steps value. It reaches the CUDA memory limit issue at line
self.accelerator.backward(loss)

What else do you suggest to resolve this? Can you elaborate on your fix or was it just setting the above flags? I have attempted many different configurations for my accelerate environment but cannot bypass this memory issue. Please let me know what else I can try. Thanks for the help!

atonalfreerider · 2024-12-24T15:34:14Z

in configs/default.yaml you need to set fp16: True, and then your geometry training will run

However, that causes an error with texture training here:

  File "/home/john/Desktop/3DPose/PuzzleAvatar/thirdparties/nvdiffrast/nvdiffrast/torch/ops.py", line 657, in forward
    out, work_buffer = _get_plugin().antialias_fwd(color, rast, pos, tri, topology_hash)
RuntimeError: antialias_fwd(): Inputs color, rast, pos must be float32 tensors

I'm able to run float32 on an 11GB VRAM GPU for texture training by adding these lines to guidance.py

            pipe = DiffusionPipeline.from_pretrained(
                self.base_model_key,
                torch_dtype=torch.float32,
                requires_safety_checker=False,
            ).to(self.device)
            # add memory offloading
            pipe.enable_model_cpu_offload()
            if not cfg.train.fp16:
                # fp32 requires extra low memory settings
                pipe.enable_vae_slicing()
                pipe.enable_vae_tiling()

I also needed to remove the unused prediction depth values from these lines from cores/lib/trainer.py

preds_depth = preds_depth * preds_alpha + (1 - preds_alpha)

                    preds_depth_list = [
                        torch.zeros_like(preds_depth).to(self.device)
                        for _ in range(self.world_size)
                    ]    # [[B, ...], [B, ...], ...]
                    dist.all_gather(preds_depth_list, preds_depth)
                    preds_depth = torch.cat(preds_depth_list, dim=0)

I hope this helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I set it up so I can run it with 24GB ？ #15

How do I set it up so I can run it with 24GB ？ #15

jim-1ee commented Nov 28, 2024

xdobetter commented Dec 3, 2024

marcusrdlee commented Dec 10, 2024

atonalfreerider commented Dec 24, 2024

How do I set it up so I can run it with 24GB ？ #15

How do I set it up so I can run it with 24GB ？ #15

Comments

jim-1ee commented Nov 28, 2024

--gradient_checkpointing \

xdobetter commented Dec 3, 2024

marcusrdlee commented Dec 10, 2024

atonalfreerider commented Dec 24, 2024