Skip to content

Conversation

EmreAdabag
Copy link

This fixes two bugs that cause unexpected behavior when the hidden dim isn't evenly divisible by the quantization group size like in Stories42M which has hidden dim 1376 and group size 64.

  1. Matmul uses the wrong scaling factors when performing matmul(_, _, _, hidden_dim, _);
  2. When quantizing vectors of length hidden_dim the tail-end hidden_dim % group_size elements aren't quantized.

This fix enables inference to be run with quantized models exported by export.py regardless of hidden_dim % group_size. This has been tested with Stories42M and validated against a python implementation of quantized inference. There will be a negligible performance hit caused by smaller group sizes during the matrix multiplication, otherwise the performance of quantized inference should remain unchanged.

Alternatively/additionally, just ensure that hidden_dim % group_size == 0 in export.py.

Both bugs cause unexpected behavior when the
model's hidden dim isn't evenly divisible
by the group size, like Stories42M which has
hidden dim 1376 and group size 64.
@EmreAdabag
Copy link
Author

Fixing this in export.py
#533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant