Skip to content

[Issue]: Incorrect MXFP4 mantissa rounding #974

@hann-wang

Description

@hann-wang

Problem Description

Issue with normal values

We have an FP32 tensor defined as follows:

torch.tensor([
    0.3463, -0.0499, 0.3794, -2.3764, 0.3960, 0.5989, -0.6250, -0.4239,
    -0.6192, 0.2761, -0.3746, 0.1462, 0.5216, -0.5194, -0.8497, -0.8237,
    0.5145, 0.3508, 0.0298, -0.3462, 0.3917, 0.1096, 1.4494, 1.7009,
    -0.5473, -2.0851, 1.8099, 0.3587, 1.4622, -2.3143, 0.3412, -0.8931
])

With a UE8M0 scale of 126, the packed FP4 generated by _mxfp4_quant_op is

[129, 226,  34, 171,  26,  25, 162, 187,  18, 144,   2,  85, 234,  22, 229, 193]

However, torchao has a different result which is aligned with MI355 instruction v_cvt_scalef32_pk_fp4_f32 :

[129, 226,  34, 170,  26,  25, 162, 187,  18, 144,   2,  85, 234,  22, 229, 193]

Specifically, the value -0.6250 / 0.5 is rounded to -1 instead of -1.5.

Issue with denormal values

Denormal values cannot be rounded up in the current aiter implementation.

For example:

FP32 numbers: 0x3F000000 and 0x3F000003
ue8m0 scale: 128

torchao and v_cvt_scalef32_pk_fp4_f32 round 0x3F000000 to 0 and 0x3F000003 to 0.5,
while the current aiter rounds both to 0.5.

Operating System

Ubuntu 22.04

CPU

AMD EPYC 73F3 16-Core Processor

GPU

AMD Instinct MI355X

ROCm Version

ROCm 7.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Possible fix:

https://github.com/AMD-AGI/torchtitan-fa-fp8/blob/han/mxfp4_refactor/torchtitan/experiments/kernels/blockwise_fp4/mxfp_quantization.py#L50-L121

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions