SDPA Int8 Quantisation using MTQ #170

satya-penamakuri · 2025-04-04T19:38:55Z

I'm attempting to quantize SDPA similarly to TRT's approach—by adding Q-DQ on the Q, K, V, and softmax output. I've observed that MOQ implements this, whereas MTQ does not. Additionally, I would like to perform QAT after adding quantizers via MTQ. Does MTQ offer kernels to mitigate training slowdowns? This is crucial since SDPA typically features efficient implementations, and the quantization process requires inserting a Q-DQ at an intermediate output.

i-riyad assigned jingyu-ml Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDPA Int8 Quantisation using MTQ #170

SDPA Int8 Quantisation using MTQ #170

satya-penamakuri commented Apr 4, 2025

SDPA Int8 Quantisation using MTQ #170

SDPA Int8 Quantisation using MTQ #170

Comments

satya-penamakuri commented Apr 4, 2025