-
Notifications
You must be signed in to change notification settings - Fork 657
Description
Your current environment
Does it support ep=4 etp=4?
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--trust-remote-code
--distributed-executor-backend=mp
-tp=16
-dp=1
--port 8006
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--enable-expert-parallel
--compilation_config 0
--gpu-memory-utilization 0.96
--additional-config '{"expert_tensor_parallel_size":4, "ascend_scheduler_config":{}}' &> run.log &
@wangxiyuan @Angazenn @Yikun
🐛 Describe the bug
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM
--trust-remote-code
--distributed-executor-backend=mp
-tp=16
-dp=1
--port 8006
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--enable-expert-parallel
--compilation_config 0
--gpu-memory-utilization 0.96
--additional-config '{"expert_tensor_parallel_size":4, "ascend_scheduler_config":{}}' &> run.log &