Open
Description
How to reproduce
Using a p4d.24xlarge:
from parallelformers import parallelize
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "facebook/opt-66b"
batch_size = [1]
batch = [["out story begins on"] * bs for bs in batch_size]
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
inputs = [tokenizer(seq, return_tensors="pt").input_ids for seq in batch]
parallelize(model, num_gpus=8, fp16=True)
for _ in range(100):
model.generate(
torch.cat(inputs, dim=0),
do_sample=True,
max_length=2048,
num_return_sequences=1,
)
It loads okay and begins performing inference.
Can see all 8 GPUs at 90+% utilization using nvidia-smi
for a while.
Then eventually one GPU drops to 0%, the others jump to 100%.
Terminal shows:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/deepspeed/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
obj = _ForkingPickler.dumps(obj)
File "/home/ubuntu/miniconda3/envs/deepspeed/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "/home/ubuntu/miniconda3/envs/deepspeed/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 367, in reduce_storage
df = multiprocessing.reduction.DupFd(fd)
File "/home/ubuntu/miniconda3/envs/deepspeed/lib/python3.8/multiprocessing/reduction.py", line 198, in DupFd
return resource_sharer.DupFd(fd)
File "/home/ubuntu/miniconda3/envs/deepspeed/lib/python3.8/multiprocessing/resource_sharer.py", line 48, in __init__
new_fd = os.dup(fd)
OSError: [Errno 9] Bad file descriptor
It then seems to hang forever from there.
I do realize this stacktrace doesn't give enough enough to get back to parallelformers, which is frustrating. Maybe it's actually a bug in PyTorch or Multiprocessing?
Environment
- OS : Ubuntu 20.04.4 LTS
- Python version : 3.8.13
- Transformers version : 4.24.0
- Whether to use Docker : No
- Misc. : N/A