Skip to content

vllm部署internvl2.5 cpu占用率高影响推理速度的问题 #1227

@Lcx2000

Description

@Lcx2000

环境:
vllm 0.9.1 torch 2.7.0+cu126 transformers 4.53.2
L4卡部署internvl2.5 1B模型
cpu Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz CPU max MHz: 3400

部署代码:
python -m vllm.entrypoints.openai.api_server --served-model-name internvl2_5 --model internvl2_5_1B
--tensor-parallel-size 1
--gpu-memory-utilization 0.9
--port 9084 --trust-remote-code
--max_model_len 2432
--max_num_seqs 2
--max_num_batched_tokens 4864

cpu占用率超级高,限制cpu为4(1200ms)的时间是不限制cpu(400ms)时间的三倍,性能瓶颈严重,请问问题应该怎么解决?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions