Skip to content

[MXLinear]Where is the operator call for implementing MXFP8 in NVD? #3543

@LucaHW

Description

@LucaHW

In the forward method of the MXLinear class, mx_mm.apply is called, although MXTensor.to_mx is also invoked. The following code implements the quantization processing of MXFP8:
scale_e8m0_biased, data_lp = to_mx(data_hp, elem_dtype, block_size, scaling_mode, is_swizzled_scales)

When examining the implementation of to_mx, I noticed that it does not call any CUDA-related low-precision operators; instead, it uses simulated low-precision implementations. What could be the reason for this? And where are the CUDA MXFP8 low-precision operators called? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions