Skip to content

Conversation

@Sergei-Lebedev
Copy link
Contributor

What

Lazily initialize TL NCCL and TL CUDA on first CUDA collective.

Why ?

Both NCCL and CUDA require CUDA devices to be set before team create. In MPI workloads it's not always possible since MPI_Init creates UCC team and to set device we need to know rank and local rank.

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants