Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with torchrun --nproc_per_node num Command and Llama 3.1 Model Conflicts #322

Open
shenshaowei opened this issue Aug 20, 2024 · 1 comment

Comments

@shenshaowei
Copy link

Issue 1: For example, the model downloaded from the Meta official website via the provided URL in download.sh results in 8 separate models. When trying to run the code using example_chat_completion.py in conjunction with the Llama folder provided by the official website, I only have 2 GPUs available and find that it cannot run. Does this mean that the 70B model, which consists of 8 models, requires 8 GPUs to run, and cannot be run on a machine with only 2 GPUs? How should the code be modified to run with only two GPUs?

Issue 2: The model I downloaded from the Meta official website using the URL provided in download.sh appears to be different from the one on Hugging Face; it is the original model as described by Hugging Face. According to Hugging Face’s explanation: “This repository contains two versions of Meta-Llama-3.1-70B-Instruct, for use with transformers and with the original Llama codebase.” Therefore, do I need to download the Llama folder and use example_chat_completion.py to run it?

@mylesgoose
Copy link

Issue 1 you probably downloaded all 8 models. 8b 8b instruct 70b 70b instruct 405b 405b instruct thats 6 models. who knows really what your talking about. lets focus on the 8b model ity has this fiels if yoru download form meta /home/myles/.llama/checkpoints/Meta-Llama3.1-8B-Instruct/checklist.chk
/home/myles/.llama/checkpoints/Meta-Llama3.1-8B-Instruct/consolidated.00.pth
/home/myles/.llama/checkpoints/Meta-Llama3.1-8B-Instruct/origparams.json
/home/myles/.llama/checkpoints/Meta-Llama3.1-8B-Instruct/params.json
/home/myles/.llama/checkpoints/Meta-Llama3.1-8B-Instruct/tokenizer.model and to run it you would type this: torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir /home/myles/.llama/checkpoints/Meta-Llama3.1-8B-Instruct/ --tokenizer_path /home/myles/.llama/checkpoints/Meta-Llama3.1-8B-Instruct/tokenizer.model --max_seq_len 128 --max_batch_size 4 however taht then runs on one 24gb gpu. since the 70b models are larger you need more vram. as it will nto fit on one 24gb gpu.maybe 8? or two 80gb A100gpus. in which hcase the mdoel would be sharded over 8 gpus or two gpus. issue 2. the model is just in different format one is huggingface transformer format and the other is some meta format or pytorch or whateverhttps://github.com/meta-llama/llama-models?tab=readme-ov-file#download

#!/bin/bash

NGPUS=8
PYTHONPATH=$(git rev-parse --show-toplevel) torchrun
--nproc_per_node=$NGPUS
models/scripts/example_chat_completion.py $CHECKPOINT_DIR
--model_parallel_size $NGPUSsomthing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants