Name		Name	Last commit message	Last commit date
parent directory ..
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt
run_stream_chatbot_demo.py		run_stream_chatbot_demo.py

README.md

Chatbot Demo

Quick Start

Assuming you have 2 A100-80GB GPUs and have download and devide the Dromedary/LLaMA checkpoints into 2 shards.

bash scripts/demo_dromedary_stream_2shards.sh

Or assuming you have 8 V100-32GB GPUs and have download and devide the Dromedary/LLaMA checkpoints into 8 shards.

bash scripts/demo_dromedary_stream_8shards.sh

Further Customization

Generally, since Dromedary is a 65B model, it requires a minimum of 130GB GPU memory to accommodate the entirety of its model weights within the GPU memory.

Customized Model Sharding

When using model parallel on MP = 1, 2, 4, 8 GPUs, you should divide the model to MP shards with utils/convert_hf_weights_to_llama_ckpt.py

python -u utils/convert_hf_weights_to_llama_ckpt.py \
    --base_model "/path/to/your/llama-65b-hf" \
    --lora_weights "/path/to/your/lora/weights" \
    --output_dir "/path/to/your/sharded_ckpt" \
    --total_ranks MP \
    --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
    --lora_r=16

When using model parallel on MP = 3, 6, 9, 12 GPUs, you should use utils/convert_hf_weights_to_llama_expanded.py to divide the original checkpoint into shards and install our customized llama_dromedary package for inference.

python -u utils/convert_hf_weights_to_llama_ckpt_expanded.py \
    --base_model "/path/to/your/llama-65b-hf" \
    --lora_weights "/path/to/your/lora/weights" \
    --output_dir "/path/to/your/sharded_ckpt" \
    --total_ranks MP \
    --expanded_att_dim 9216 \
    --expanded_ffn_dim 23040 \
    --expanded_vocab_size 32256 \
    --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
    --lora_r=16

For MP = 5, 10 GPUs, here is the recommended expansion configuration for llama_dromedary.

python -u utils/convert_hf_weights_to_llama_ckpt_expanded.py \
    --base_model "/path/to/your/llama-65b-hf" \
    --lora_weights "/path/to/your/lora/weights" \
    --output_dir "/path/to/your/sharded_ckpt" \
    --total_ranks MP \
    --expanded_att_dim 8320 \
    --expanded_ffn_dim 22400 \
    --expanded_vocab_size 32000 \
    --lora_target_modules='[q_proj,k_proj,v_proj,o_proj]' \
    --lora_r=16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference

inference

README.md

Chatbot Demo

Quick Start

Further Customization

Customized Model Sharding

Files

inference

Directory actions

More options

Directory actions

More options

Latest commit

History

inference

Folders and files

parent directory

README.md

Chatbot Demo

Quick Start

Further Customization

Customized Model Sharding