Vision modules support for Qwen2.5-VL-7B-Instruct? #1790
kbiscoding
started this conversation in
Model support
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Is there any plan from ONNX team to support vision components of multimodal models like Qwen2.5-VL-7B-Instruct?
The huggingface (.safetensors) to ONNX (.onnx) conversion for vision components is not present. The file https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/builder.py does not contain support for modules Qwen2_5_VisionRotaryEmbedding, Qwen2_5_VisionPatchEmbed, Qwen2_5_VLVisionBlock, Qwen2_5_VLVisionAttention, Qwen2_5_VLMLP which are essential for Qwen2.5-VL-7B-Instruct (and others of same family).
Some updates are also needed for language component of Qwen2.5-VL-7B-Instruct, but that I was able to pull together myself. For example correct final-layer_norm, lm_head and rotary embedding handling. I am inquiring about vision component which is currently totally unsupported.
Beta Was this translation helpful? Give feedback.
All reactions