You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the moment, I can't run the 65B model with 4 GPUs and a total of 96GB.
I investigate, bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable are a first idea ...
[1] % torchrun --nproc_per_node 4 example.py --ckpt_dir ../../LLaMA/30B --tokenizer_path ../../LLaMA/tokenizer.model
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable foreach process to be 1in default, to avoid your system being overloaded, please further tune the variable foroptimal performancein your application as needed.
*****************************************
/home/scampion/Code/llama/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/scampion/Code/llama/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/scampion/Code/llama/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/scampion/Code/llama/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Allocating transformer on host
Allocating transformer on host
Allocating transformer on host
Allocating transformer on host
Traceback (most recent call last):
File "/home/scampion/Code/llama-int8/example.py", line 129, in<module>
fire.Fire(main)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/scampion/Code/llama-int8/example.py", line 101, in main
generator = load(ckpt_dir, tokenizer_path, max_seq_len, max_batch_size, use_int8)
File "/home/scampion/Code/llama-int8/example.py", line 38, in load
model = Transformer(model_args)
File "/home/scampion/Code/llama-int8/llama/model.py", line 255, in __init__
self.layers.append(TransformerBlock(layer_id, params))
File "/home/scampion/Code/llama-int8/llama/model.py", line 206, in __init__
self.attention = Attention(args)
File "/home/scampion/Code/llama-int8/llama/model.py", line 132, in __init__
).cuda()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 208.00 MiB (GPU 0; 23.68 GiB total capacity; 5.08 GiB already allocated; 6.94 MiB free; 5.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/home/scampion/Code/llama-int8/example.py", line 129, in<module>
fire.Fire(main)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/scampion/Code/llama-int8/example.py", line 101, in main
generator = load(ckpt_dir, tokenizer_path, max_seq_len, max_batch_size, use_int8)
File "/home/scampion/Code/llama-int8/example.py", line 38, in load
model = Transformer(model_args)
File "/home/scampion/Code/llama-int8/llama/model.py", line 255, in __init__
self.layers.append(TransformerBlock(layer_id, params))
File "/home/scampion/Code/llama-int8/llama/model.py", line 206, in __init__
self.attention = Attention(args)
File "/home/scampion/Code/llama-int8/llama/model.py", line 129, in __init__
).cuda()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 208.00 MiB (GPU 0; 23.68 GiB total capacity; 5.28 GiB already allocated; 6.94 MiB free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/home/scampion/Code/llama-int8/example.py", line 129, in<module>
fire.Fire(main)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/scampion/Code/llama-int8/example.py", line 101, in main
generator = load(ckpt_dir, tokenizer_path, max_seq_len, max_batch_size, use_int8)
File "/home/scampion/Code/llama-int8/example.py", line 38, in load
model = Transformer(model_args)
File "/home/scampion/Code/llama-int8/llama/model.py", line 255, in __init__
self.layers.append(TransformerBlock(layer_id, params))
File "/home/scampion/Code/llama-int8/llama/model.py", line 206, in __init__
self.attention = Attention(args)
File "/home/scampion/Code/llama-int8/llama/model.py", line 129, in __init__
).cuda()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 208.00 MiB (GPU 0; 23.68 GiB total capacity; 5.28 GiB already allocated; 6.94 MiB free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/home/scampion/Code/llama-int8/example.py", line 129, in<module>
fire.Fire(main)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/scampion/Code/llama-int8/example.py", line 101, in main
generator = load(ckpt_dir, tokenizer_path, max_seq_len, max_batch_size, use_int8)
File "/home/scampion/Code/llama-int8/example.py", line 38, in load
model = Transformer(model_args)
File "/home/scampion/Code/llama-int8/llama/model.py", line 255, in __init__
self.layers.append(TransformerBlock(layer_id, params))
File "/home/scampion/Code/llama-int8/llama/model.py", line 206, in __init__
self.attention = Attention(args)
File "/home/scampion/Code/llama-int8/llama/model.py", line 129, in __init__
).cuda()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 208.00 MiB (GPU 0; 23.68 GiB total capacity; 5.28 GiB already allocated; 6.94 MiB free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 887816) of binary: /home/scampion/Code/llama/venv/bin/python
Traceback (most recent call last):
File "/home/scampion/Code/llama/venv/bin/torchrun", line 8, in<module>sys.exit(main())
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/scampion/Code/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example.py FAILED
------------------------------------------------------------
Failures:
[1]:
time: 2023-03-14_09:55:43
host : vector
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 887817)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time: 2023-03-14_09:55:43
host : vector
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 887818)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time: 2023-03-14_09:55:43
host : vector
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 887819)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time: 2023-03-14_09:55:43
host : vector
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 887816)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
(venv)
The text was updated successfully, but these errors were encountered:
For the moment, I can't run the 65B model with 4 GPUs and a total of 96GB.
I investigate,
bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable
are a first idea ...The text was updated successfully, but these errors were encountered: