In the forward method of the MXLinear class, mx_mm.apply is called, although MXTensor.to_mx is also invoked. The following code implements the quantization processing of MXFP8:
scale_e8m0_biased, data_lp = to_mx(data_hp, elem_dtype, block_size, scaling_mode, is_swizzled_scales)
When examining the implementation of to_mx, I noticed that it does not call any CUDA-related low-precision operators; instead, it uses simulated low-precision implementations. What could be the reason for this? And where are the CUDA MXFP8 low-precision operators called? Thank you.