You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use NF4 real quantization and came across an error because of scales not being divisible by block size. We should add padding to the scales so that we can quantize it using block quant to mitigate this issue. This can be achieved by adding reduce_block_padding function to double_quantization function.
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
I am trying to use NF4 real quantization and came across an error because of scales not being divisible by block size. We should add padding to the scales so that we can quantize it using block quant to mitigate this issue. This can be achieved by adding
reduce_block_padding
function todouble_quantization
function.Suggested change
in the following line of code
Version
nvidia_modelopt == 0.27.1
The text was updated successfully, but these errors were encountered: