-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Description
I am attempting to export the Qwen3-VL-Embedding-2B model to OpenVINO. This blocks the deployment of the new multimodal embedding capabilities.
I have encountered two blockers:
Optimum-CLI limitation: feature-extraction task is not yet supported for qwen3_vl architecture.
OpenVINO conversion crash: Manual conversion using ov.convert_model fails with RuntimeError: unordered_map::at. The traceback indicates an issue tracing torch.vmap operations within transformers.masking_utils (specifically _vmap_for_bhqkv), even when attn_implementation="eager" is explicitly set.
To Reproduce
Environment:
openvino==2025.3
torch==2.5.1+cpu
transformers (Qwen3-VL branch/latest)
optimum-intel (latest source)
Minimal Reproduction Script:
code
Python
import torch
import openvino as ov
from transformers import AutoModel, AutoProcessor
from PIL import Image
model_id = "Qwen/Qwen3-VL-Embedding-2B"
Load with eager attention to attempt disabling FlashAttn/SDPA optimization
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
attn_implementation="eager",
device_map="cpu"
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
Prepare multimodal dummy input
dummy_image = Image.new('RGB', (28, 28), color='black')
dummy_text = "<|image_pad|>Describe this image."
inputs = processor(text=[dummy_text], images=[dummy_image], return_tensors="pt")
Wrapper to align with OpenVINO input expectations
class Wrapper(torch.nn.Module):
def init(self, model):
super().init()
self.model = model
def forward(self, input_ids, attention_mask, pixel_values, image_grid_thw):
return self.model(
input_ids=input_ids, attention_mask=attention_mask,
pixel_values=pixel_values, image_grid_thw=image_grid_thw,
output_hidden_states=True
).last_hidden_state
Crash happens here
ov_model = ov.convert_model(
Wrapper(model),
example_input=(inputs.input_ids, inputs.attention_mask, inputs.pixel_values, inputs.image_grid_thw)
)
Relevant Traceback
The error occurs deep within the PyTorch frontend when handling the vectorized masking logic:
code
Text
Traceback (most recent call last):
...
File ".../transformers/masking_utils.py", line 392, in sdpa_mask_recent_torch
causal_mask = _vmap_for_bhqkv(mask_function)(batch_arange, head_arange, cache_position, kv_arange)
...
File ".../torch/_functorch/vmap.py", line 484, in _flat_vmap
batched_outputs = func(*batched_inputs, **kwargs)
...
File ".../openvino/frontend/pytorch/ts_decoder.py", line 84, in init
raise RuntimeError(
RuntimeError: Couldn't get TorchScript module by tracing.
Exception:
unordered_map::at
Request
Please add support for task="feature-extraction" for qwen3_vl in Optimum Intel.
Fix the OpenVINO PyTorch frontend to handle (or correctly bypass) torch.vmap / functorch constructs used in the new Transformers masking implementation, or provide a workaround to strictly disable these paths during tracing.