You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue described involves a KeyError when attempting to deploy a quantized DeepSeek-V2-Lite-Chat model using the vllm framework. The error occurs during the weight-loading process, where the key names in the model's named_parameters() dictionary (params_dict) do not match the key names in the quantized weight file. Specifically:
Key Mismatch:
The original key in self.named_parameters() is 'model.layers.21.mlp.experts.w2_qweight'.
The processed key in the quantized weight file is 'model.layers.21.mlp.experts.w2_weight'.
Error Cause:
When loading the weights, the code attempts to access param = params_dict[name], but the name from the weight file does not exist in params_dict, resulting in a KeyError.
vllm version:
vllm-main commit hash debd6bb
or
https://github.com/ZZBoom/vllm/commits/main/
commit hash fc7c714854f422a7e000bcc9fa31d4f61796a7b6
How can this issue be resolved?
Error stack trace:
File "/mnt/lwq/lwq/quant/vllm/vllm-main-debd6bb/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
driver_worker_output = run_method(self.driver_worker, sent_method,
File "/mnt/lwq/lwq/quant/vllm/vllm-main-debd6bb/vllm/utils.py", line 2238, in run_method
return func(*args, **kwargs)
File "/mnt/lwq/lwq/quant/vllm/vllm-main-debd6bb/vllm/worker/worker.py", line 183, in load_model
self.model_runner.load_model()
File "/mnt/lwq/lwq/quant/vllm/vllm-main-debd6bb/vllm/worker/model_runner.py", line 1113, in load_model
self.model = get_model(vllm_config=self.vllm_config)
File "/mnt/lwq/lwq/quant/vllm/vllm-main-debd6bb/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
return loader.load_model(vllm_config=vllm_config)
File "/mnt/lwq/lwq/quant/vllm/vllm-main-debd6bb/vllm/model_executor/model_loader/loader.py", line 426, in load_model
loaded_weights = model.load_weights(
File "/mnt/lwq/lwq/quant/vllm/vllm-main-debd6bb/vllm/model_executor/models/deepseek_v2.py", line 790, in load_weights
param = params_dict[name]
KeyError: 'model.layers.10.mlp.experts.w2_weight
My dynamic quantization settings are set to the default configuration from the gptqmodel homepage:
python
dynamic = {
# .*\. matches the layers_node prefix
# layer index starts at 0
# positive match: layer 19, gate module
r"+:.*\.18\..*gate.*": {"bits": 4, "group_size": 32},
# positive match: layer 20, gate module (prefix defaults to positive if missing)
r".*\.19\..*gate.*": {"bits": 8, "group_size": 64},
# negative match: skip layer 21, gate module
r"-:.*\.20\..*gate.*": {},
# negative match: skip all down modules for all layers
r"-:.*down.*": {},
}
@liweiqing1997 Can you upload the quantized model so we can skip the slow quant stage and directly run it?
I'm very sorry, our model involves private data and may not be convenient to share.
Do you have the resources to quantize a small MoE model, such as DeepSeek-V2-Lite-Chat? If it's not convenient for you, I'll think of other solutions. Thank you very much.
@liweiqing1997 Totally understand. We will try to quant and fix this by next week. The bug is most likely in vLLM change model parameter names based on your stack traces.
Describe the bug
The issue described involves a KeyError when attempting to deploy a quantized DeepSeek-V2-Lite-Chat model using the vllm framework. The error occurs during the weight-loading process, where the key names in the model's named_parameters() dictionary (params_dict) do not match the key names in the quantized weight file. Specifically:
Key Mismatch:
The original key in self.named_parameters() is 'model.layers.21.mlp.experts.w2_qweight'.
The processed key in the quantized weight file is 'model.layers.21.mlp.experts.w2_weight'.
Error Cause:
When loading the weights, the code attempts to access param = params_dict[name], but the name from the weight file does not exist in params_dict, resulting in a KeyError.
vllm version:
How can this issue be resolved?
Error stack trace:
My dynamic quantization settings are set to the default configuration from the gptqmodel homepage:
The config after quantization is:
Besides, GPTQModel.load test is ok
GPU Info
H20
Show output of:
Software Info
Operation System/Version + Python Version
Show output of:
The text was updated successfully, but these errors were encountered: