Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP16 input support #363

Open
skill-diver opened this issue Dec 26, 2024 · 3 comments
Open

FP16 input support #363

skill-diver opened this issue Dec 26, 2024 · 3 comments

Comments

@skill-diver
Copy link

Hi Optimum guys,

Do you know how to enable fp16 input when use your quantized model? I find qconv2d don't support fp16 input.

@dacorvo
Copy link
Collaborator

dacorvo commented Jan 2, 2025

You should load the model with dtype=torch.float16, then quantize it. Do you have a specific code snippet that reproduces the error you are facing ?

@skill-diver
Copy link
Author

Hi, thanks for answering. I get this error when I use mix precision to Inference quantized model.IMG_20250102_044328.jpg

@LukeLIN-web
Copy link

dtype=torch.float16

It seems that your weight is still fp32, Do you have a specific code snippet that reproduces the error you are facing ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants