Skip to content

something about atomic gemm #1760

Open
Open
@caichenghuan

Description

@caichenghuan

I have some questions about atomic GEMM and would like to ask for some explanation. While reading and analyzing the specific implementation of CommOverlapP2PBase::atomic_gemm_overlap_rs, I encountered a question. For example, in the case of two ranks:

  • Rank 0 needs to first compute chunk1 and send it to Rank 1, where it will be reduced with the chunk1 computed by Rank 1 itself.

  • Rank 1 needs to first compute chunk0 and send it to Rank 0, where it will be reduced with the chunk0 computed by Rank 0 itself.

However, in the current implementation,https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp#L981-L1000, both ranks start their P2P communication from chunk0. Wouldn't this cause a problem? Or is there something wrong with my understanding?

Looking forward to your reply. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions