Fix optimum.quanto quantization call in cache_utils #34606

w3rew · 2024-11-04T21:05:55Z

What does this PR do?

Fixes call to optimum.quanto.quantized_weight in QuantoQuantizedCache, which currently lacks scale and shift parameters and thus fails. This was introduced by cac4a48 when migrating to optimum.quanto I think.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@SunMarc

SunMarc · 2024-11-05T10:25:46Z

src/transformers/cache_utils.py

@@ -813,7 +813,8 @@ def _quantize(self, tensor, axis):
        if is_optimum_quanto_available():
            from optimum.quanto import quantize_weight

-            qtensor = quantize_weight(tensor, self.qtype, axis, self.q_group_size)


Can you have a look @zucchini-nlp as you did the change in this PR. Looking at optimum-quanto source code, quantize_weight do require to pass scale.

Now when I look I think it is related to the version of optimum quanto. I see pre v0.24 had no scale

https://github.com/huggingface/optimum-quanto/blob/832f7f5c3926c91fe4f923aaaf037a780ac3e6c3/optimum/quanto/tensor/qweight.py#L32-L39

But after v0.24 we have to pass scale in before group size

https://github.com/huggingface/optimum-quanto/blob/f3c400e9b5b28b499f87c30325f8628d50417eef/optimum/quanto/tensor/weights/quantization.py#L27-L36

@SunMarc if that is correct, prob we need to check the version also. Seems like a lot of checks but since the old quanto should be removed in next v4.47 release, could be a workaround. WDYT?

Sounds good to me ! Also what was the issue with the prior implementation ? Was it to just simplify a bit the code ?

You mean the prev PR I merged? It only made the code compatible with optimum-quanto v0.24 but I forgot there could be older versions. For the why optimum-quanto is changing its code, i have no idea. But would be nice if they wouldnt change it drastically anymore 😅

i guess you'll have more info about future maintaining plans in quanto :)

@w3rew thanks for opening the PR, can you please update with suggested changes?

Sure! Will look into it shortly.

cc @dacorvo for visibility

@zucchini-nlp optimum-quanto 0.2.5 release was synchronized with the switch from quanto to optimum-quanto in transformers at the beginning of october, and the code in cache_utils.py was correct.
It is your pull-request to align with 0.2.4 (a version that was never supported by transformers) that was actually incorrect.

@zucchini-nlp So, just to clarify: is the version check necessary? If the code used to support only 0.2.5, the additional check seems to be redundant. More so if 0.2.4 was never supported.

@w3rew hmm, iirc the first time I triggered tests for quant cache pip installed me the 0.2.4 version by default so that was the reason for modification. In case as you mention, we can leave only the post-0.2.5 logic but would be nice to add a version check near the import-chech for optimum quanto with explicit error message that we need the optimum quanto with v0.2.5 or higher

HuggingFaceDocBuilderDev · 2024-11-05T13:12:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ruidazeng · 2024-11-06T14:35:00Z

src/transformers/cache_utils.py

@@ -813,7 +813,8 @@ def _quantize(self, tensor, axis):
        if is_optimum_quanto_available():
            from optimum.quanto import quantize_weight

-            qtensor = quantize_weight(tensor, self.qtype, axis, self.q_group_size)
+            scale, zeropoint = self.optimizer(tensor, self.qtype, axis, self.q_group_size)
+            qtensor = quantize_weight(tensor, self.qtype, axis, scale, zeropoint, self.q_group_size)


Why is adding scale to quantize_weight necessary? What does this fix?

optimum-quanto expects scale and shift to be passed to quantize_weight, and the current implementation does not, so quantization fails

SunMarc · 2024-11-15T14:34:27Z

I've updated the PR here so we fix it quickly ! I've added you as co-author @w3rew !

Fix quantization call in cache_utils

930f2d7

SunMarc requested a review from zucchini-nlp November 5, 2024 10:23

SunMarc reviewed Nov 5, 2024

View reviewed changes

ruidazeng reviewed Nov 6, 2024

View reviewed changes

SunMarc mentioned this pull request Nov 15, 2024

Fix cache_utils for optimum.quanto kvcache quantization #34750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix optimum.quanto quantization call in cache_utils #34606

Fix optimum.quanto quantization call in cache_utils #34606

Uh oh!

w3rew commented Nov 4, 2024

Uh oh!

SunMarc Nov 5, 2024

Uh oh!

zucchini-nlp Nov 5, 2024

Uh oh!

SunMarc Nov 5, 2024

Uh oh!

zucchini-nlp Nov 5, 2024

Uh oh!

zucchini-nlp Nov 5, 2024

Uh oh!

w3rew Nov 5, 2024

Uh oh!

SunMarc Nov 5, 2024

Uh oh!

dacorvo Nov 5, 2024 •

edited

Loading

Uh oh!

w3rew Nov 8, 2024

Uh oh!

zucchini-nlp Nov 15, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Nov 5, 2024

Uh oh!

ruidazeng Nov 6, 2024

Uh oh!

w3rew Nov 8, 2024

Uh oh!

SunMarc commented Nov 15, 2024

Uh oh!

Uh oh!

Fix optimum.quanto quantization call in cache_utils #34606

Are you sure you want to change the base?

Fix optimum.quanto quantization call in cache_utils #34606

Uh oh!

Conversation

w3rew commented Nov 4, 2024

What does this PR do?

Before submitting

Who can review?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dacorvo Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 5, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Nov 15, 2024

Uh oh!

Uh oh!

dacorvo Nov 5, 2024 •

edited

Loading