You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to quantize SDPA similarly to TRT's approach—by adding Q-DQ on the Q, K, V, and softmax output. I've observed that MOQ implements this, whereas MTQ does not. Additionally, I would like to perform QAT after adding quantizers via MTQ. Does MTQ offer kernels to mitigate training slowdowns? This is crucial since SDPA typically features efficient implementations, and the quantization process requires inserting a Q-DQ at an intermediate output.
The text was updated successfully, but these errors were encountered:
I'm attempting to quantize SDPA similarly to TRT's approach—by adding Q-DQ on the Q, K, V, and softmax output. I've observed that MOQ implements this, whereas MTQ does not. Additionally, I would like to perform QAT after adding quantizers via MTQ. Does MTQ offer kernels to mitigate training slowdowns? This is crucial since SDPA typically features efficient implementations, and the quantization process requires inserting a Q-DQ at an intermediate output.
The text was updated successfully, but these errors were encountered: