Skip to content

VLLM serve VLM quantized Gemma3 KeyError: 'vision_model.encoder.layers.0.self_attn.qkv_proj.weight' #1546

Open
@giangntapero

Description

@giangntapero

Describe the bug
Quantize Gemma3 follow example at: https://github.com/vllm-project/llm-compressor/blob/main/examples/multimodal_vision/gemma3_example.py

After that, serving model using vllm command:
vllm serve /gemma-3-4b-it-W4A16-G128 --served-model-name google/gemma-3-4b-it --max-model-len 8192 --limit-mm-per-prompt image=1 --gpu-memory-utilization 0.9

Errors
Process EngineCore_0:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 519, in run_engine_core
raise e
File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 506, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 390, in init
super().init(vllm_config, executor_class, log_stats,
File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 76, in init
self.model_executor = executor_class(vllm_config)
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 53, in init
self._init_executor()
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor
self.collective_rpc("load_model")
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 2671, in run_method
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_worker.py", line 180, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1601, in load_model
self.model = model_loader.load_model(
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 41, in load_model
self.load_weights(model, model_config)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 269, in load_weights
loaded_weights = model.load_weights(
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/gemma3_mm.py", line 700, in load_weights
return loader.load_weights(weights)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 278, in load_weights
autoloaded_weights = set(self._load_module("", self.module, weights))
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 236, in _load_module
yield from self._load_module(prefix,
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 209, in _load_module
loaded_params = module_load_weights(weights)
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/siglip.py", line 514, in load_weights
param = params_dict[name]
KeyError: 'vision_model.encoder.layers.0.self_attn.qkv_proj.weight'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions