Feat/nccl #766

Use-AIrs · 2025-07-04T09:43:57Z

So started with a NcclDevice behind a nccl feature flag. With some CudaDevice implementations which come with the feature to make it easier to set up groups.

CudaDevice is needed to be passed trough client so it comes trough .info() now with the feature on.

Next comes NcclOp with NcclOpBuilder. NcclOpBuilder will first translate the input Handel(s) to get ptr und size. Then it can be executed, which will build a cudarc::nccl::sys::ncclComm_t, then starts the nccl group executes Futures on pointers step by step to then stop the nccl group. At least thats the plan for now.

Use-AIrs · 2025-07-04T11:28:44Z

No Future here, that would be a overcomplicated abstraction ^^

nathanielsimard · 2025-07-04T16:01:24Z

crates/cubecl-cuda/src/compute/nccl.rs

+        devices
+    }
+
+    pub fn linked_groups(split: Vec<f64>) -> Vec<Self> {


What is a split? Maybe add some docs in each public functions to help users

In this Context you get a f64 for each device group you want do build. Each group gets devices assigned depending on the proportion of each f64 compared to the sum of Vec...

Just asking early on that frequently, because i want to get a felling if getting deeper in the code gets backslash, sorry for not pointing that out.
Since you did not comment the use of using CudaRuntime::Info i assume that using existing code is allowed as long as its behind the feature flag ...

Use-AIrs added 4 commits June 29, 2025 13:21

CudaDevice new and small ctx addition

efe69c0

NcclDevice Added and some Device fns with feature nccl

8c6f094

fmt

b2fc414

Merge branch 'tracel-ai:main' into Feat/nccl

2001df1

nathanielsimard reviewed Jul 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/nccl #766

Feat/nccl #766

Use-AIrs commented Jul 4, 2025

Uh oh!

Use-AIrs commented Jul 4, 2025 •

edited

Loading

Uh oh!

nathanielsimard Jul 4, 2025

Uh oh!

Use-AIrs Jul 5, 2025

Uh oh!

Uh oh!

Feat/nccl #766

Are you sure you want to change the base?

Feat/nccl #766

Conversation

Use-AIrs commented Jul 4, 2025

Uh oh!

Use-AIrs commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nathanielsimard Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Use-AIrs Jul 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Use-AIrs commented Jul 4, 2025 •

edited

Loading