[MXLinear]Where is the operator call for implementing MXFP8 in NVD?

In the forward method of the MXLinear class, `mx_mm.apply` is called, although `MXTensor.to_mx` is also invoked. The following code implements the quantization processing of MXFP8：
scale_e8m0_biased, data_lp = to_mx(data_hp, elem_dtype, block_size, scaling_mode, is_swizzled_scales)

When examining the implementation of to_mx, I noticed that it does not call any CUDA-related low-precision operators; instead, it uses simulated low-precision implementations. What could be the reason for this? And where are the CUDA MXFP8 low-precision operators called? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MXLinear]Where is the operator call for implementing MXFP8 in NVD? #3543

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[MXLinear]Where is the operator call for implementing MXFP8 in NVD? #3543

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions