Skip to content

Finetuning LLM workspace template failed with OOM for LoRA/Llama70B #174

@sudhirn-anyscale

Description

@sudhirn-anyscale

Launched finetuning job as follows and it failed with OOM Error for Llama-2-70B
ray_job_log_job_eqeqt513ex4xy1sgwgcjk8ag1i.log

$ python main.py job_compute_configs/aws.yaml training_configs/lora/llama-2-70b-4k-4xg5_48xlarge.yaml

Error

   result = forward_call(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 268, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.75 GiB (GPU 4; 21.99 GiB total capacity; 16.79 GiB already allocated; 907.38 MiB free; 20.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions