Vision modules support for Qwen2.5-VL-7B-Instruct? #1790

kbiscoding · 2025-09-23T16:19:57Z

kbiscoding
Sep 23, 2025

Hello,

Is there any plan from ONNX team to support vision components of multimodal models like Qwen2.5-VL-7B-Instruct?

The huggingface (.safetensors) to ONNX (.onnx) conversion for vision components is not present. The file https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py does not contain support for modules Qwen2_5_VisionRotaryEmbedding, Qwen2_5_VisionPatchEmbed, Qwen2_5_VLVisionBlock, Qwen2_5_VLVisionAttention, Qwen2_5_VLMLP which are essential for Qwen2.5-VL-7B-Instruct (and others of same family).

Some updates are also needed for language component of Qwen2.5-VL-7B-Instruct, but that I was able to pull together myself. For example correct final-layer_norm, lm_head and rotary embedding handling. I am inquiring about vision component which is currently totally unsupported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vision modules support for Qwen2.5-VL-7B-Instruct? #1790

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Vision modules support for Qwen2.5-VL-7B-Instruct? #1790

Uh oh!

Uh oh!

kbiscoding Sep 23, 2025

Replies: 0 comments

kbiscoding
Sep 23, 2025