-
Notifications
You must be signed in to change notification settings - Fork 70
Open
Description
I'm currently testing with Chrome v137.0.7151.69 and macOS 13.0. I'm using [email protected]
. I can't seem to run the new v3 model with webgpu onnnx EP without a huge performance degradation compared to the older model.
Code:
https://gist.github.com/mattdesl/30bc5de23eb6edfd7362d91d43170922
(change the "provider" and "model" variables in main.js)
I'm testing three models:
{
// The "new" v3 model
v3: {
image_encoder: "uform3-image-text-english-small/image_encoder.onnx",
text_encoder: "uform3-image-text-english-small/text_encoder.onnx",
},
// The "old" models ...
fp16: {
text_encoder: "uform-vl-english-small-gpu-fp16/text_encoder.onnx",
image_encoder: "uform-vl-english-small-gpu-fp16/image_encoder.onnx",
},
fp32: {
text_encoder: "uform-vl-english-small-cpu-fp32/text_encoder.onnx",
image_encoder: "uform-vl-english-small-cpu-fp32/image_encoder.onnx",
},
}
Using webgpu backend, testing only image encoding / inference time:
v3 ~7000 ms
fp16 ~800 ms
fp32 ~750 ms
The v3 model seems to produce inaccurate/incorrect cosine similarity in webgpu mode.
Using cpu backend:
v3 ~6500 ms
fp16 N/A
fp32 ~7000 ms
I am hoping it's just something I've done wrong that is causing the v3 webgpu to both fail to infer correctly and perform very slowly?
Metadata
Metadata
Assignees
Labels
No labels