-
Notifications
You must be signed in to change notification settings - Fork 29.5k
Fix optimum.quanto quantization call in cache_utils #34606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -813,7 +813,8 @@ def _quantize(self, tensor, axis): | |
if is_optimum_quanto_available(): | ||
from optimum.quanto import quantize_weight | ||
|
||
qtensor = quantize_weight(tensor, self.qtype, axis, self.q_group_size) | ||
scale, zeropoint = self.optimizer(tensor, self.qtype, axis, self.q_group_size) | ||
qtensor = quantize_weight(tensor, self.qtype, axis, scale, zeropoint, self.q_group_size) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is adding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. optimum-quanto expects |
||
return qtensor | ||
elif is_quanto_available(): | ||
logger.warning_once( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have a look @zucchini-nlp as you did the change in this PR. Looking at optimum-quanto source code,
quantize_weight
do require to passscale
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now when I look I think it is related to the version of optimum quanto. I see pre v0.24 had no scale
https://github.com/huggingface/optimum-quanto/blob/832f7f5c3926c91fe4f923aaaf037a780ac3e6c3/optimum/quanto/tensor/qweight.py#L32-L39
But after v0.24 we have to pass scale in before group size
https://github.com/huggingface/optimum-quanto/blob/f3c400e9b5b28b499f87c30325f8628d50417eef/optimum/quanto/tensor/weights/quantization.py#L27-L36
@SunMarc if that is correct, prob we need to check the version also. Seems like a lot of checks but since the old
quanto
should be removed in next v4.47 release, could be a workaround. WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me ! Also what was the issue with the prior implementation ? Was it to just simplify a bit the code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the prev PR I merged? It only made the code compatible with optimum-quanto v0.24 but I forgot there could be older versions. For the why optimum-quanto is changing its code, i have no idea. But would be nice if they wouldnt change it drastically anymore 😅
i guess you'll have more info about future maintaining plans in quanto :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@w3rew thanks for opening the PR, can you please update with suggested changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Will look into it shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @dacorvo for visibility
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zucchini-nlp
optimum-quanto
0.2.5 release was synchronized with the switch fromquanto
tooptimum-quanto
intransformers
at the beginning of october, and the code incache_utils.py
was correct.It is your pull-request to align with 0.2.4 (a version that was never supported by
transformers
) that was actually incorrect.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zucchini-nlp So, just to clarify: is the version check necessary? If the code used to support only 0.2.5, the additional check seems to be redundant. More so if 0.2.4 was never supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@w3rew hmm, iirc the first time I triggered tests for quant cache
pip
installed me the0.2.4
version by default so that was the reason for modification. In case as you mention, we can leave only the post-0.2.5 logic but would be nice to add a version check near the import-chech for optimum quanto with explicit error message that we need the optimum quanto with v0.2.5 or higher