Skip to content

Example 09 - gemm_one_shot_all_reduce Validation Error #287

@mawad-amd

Description

@mawad-amd

Summary

Numerical validation failure in one-shot all-reduce GEMM implementation.

Command to Reproduce

python3 examples/09_gemm_one_shot_all_reduce/benchmark.py --num_stages 1 --validate --datatype fp32

Observed Behavior

  • Validation fails on rank 1 only (rank 0 passes)
  • Large numerical discrepancies in output tensor C
  • Max absolute difference: 328.2
  • Example mismatch: C=-33.74 vs expected=-94.16 at index (99, 3394)

Configuration

  • world_size=2, M=8192, N=4608, K=36864
  • BLK_M=256, BLK_N=64, BLK_K=64
  • datatype=fp32, num_stages=1
  • Registers: 168, Spills: 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    examplesExamples showcasing Iris APIs and usageirisIris project issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions