Skip to content

Qwen 2.5 VL inference on NPU is not producing any output. #1305

Open
@nvsreerag

Description

@nvsreerag

I have been trying to do inference of Qwen 2.5 VL 7b model on NPU and it is not giving any output.

Activity

SearchSavior

SearchSavior commented on Jun 9, 2025

@SearchSavior

How did you convert the model and does it compile?

nvsreerag

nvsreerag commented on Jun 10, 2025

@nvsreerag
Author

I converted the model using the command below and also attempted symmetric quantization. However, after updating to the latest packages, it now throws an error.

Command:
optimum-cli export openvino --model Qwen/Qwen2.5-VL-7B-Instruct Qwen2.5-VL-7B-Instruct/FP16 --weight-format fp16

Error:

RuntimeError: Exception from src\inference\src\cpp\core.cpp:112:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\plugins\intel_npu\src\plugin\src\plugin.cpp:492:
Exception from src\plugins\intel_npu\src\compiler_adapter\src\ze_graph_ext_wrappers.cpp:314:
L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004 - generic error code for invalid arguments . [NPU_VCL] Compiler returned msg:
Upper bounds were not specified, got the default value - '9223372036854775807'
SearchSavior

SearchSavior commented on Jun 11, 2025

@SearchSavior

OK so I dont have an NPU to test with.

However, there are two places to get the code you need.

https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai/inference-with-genai-on-npu.html

details how to convert. and what performance optimizations you need to apply.

to convert I think you should try

optimum-cli export openvino -m "model" --task image-text-to-text --weight-format int4 --ratio 1 --sym --group-size -1 "converted-model"

TO inference on vision models the src lives here

https://github.com/openvinotoolkit/openvino.genai/blob/675ed6c185f1d6e2145461ad5382dad45ecc5eef/src/python/openvino_genai/py_openvino_genai.pyi#L2343

There are other classes in that file which define the Python API, using them together is a bit harder but there are notebooks at openvino notebooks

Also: I convert a lot of models on HF. In that time I launched Echo9Zulu/Optimum-CLI-Tool_tool. For me making the command building process a bit more visual helps, especially since converting different models can be a research-intensive process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @SearchSavior@nvsreerag

        Issue actions

          Qwen 2.5 VL inference on NPU is not producing any output. · Issue #1305 · huggingface/optimum-intel