FP16 input support #363

skill-diver · 2024-12-26T05:54:13Z

Hi Optimum guys,

Do you know how to enable fp16 input when use your quantized model? I find qconv2d don't support fp16 input.

dacorvo · 2025-01-02T07:57:48Z

You should load the model with dtype=torch.float16, then quantize it. Do you have a specific code snippet that reproduces the error you are facing ?

skill-diver · 2025-01-02T11:44:37Z

Hi, thanks for answering. I get this error when I use mix precision to Inference quantized model.

LukeLIN-web · 2025-01-12T02:55:09Z

dtype=torch.float16

It seems that your weight is still fp32, Do you have a specific code snippet that reproduces the error you are facing ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP16 input support #363

FP16 input support #363

skill-diver commented Dec 26, 2024

dacorvo commented Jan 2, 2025

skill-diver commented Jan 2, 2025

LukeLIN-web commented Jan 12, 2025

FP16 input support #363

FP16 input support #363

Comments

skill-diver commented Dec 26, 2024

dacorvo commented Jan 2, 2025

skill-diver commented Jan 2, 2025

LukeLIN-web commented Jan 12, 2025