Feature request: "collective_broadcast" for CPU PJRT

Just to help make a case for it:

The `collective_broadcast` is important for distributed training (SPMD) to sync up the variables and prevent potential numeric drifting across replicas. Accelerators can do this much faster (presumably?) than moving data through the host.

The fake "multi-device" CPU PJRT is very useful to develop and test distributed models without using expensive credits (and the extra hurdle) to get a multi-device hardware during development (which can take more time than the training itself in some cases).

Without `collective_broadcast` implemented in CPU though, one needs to keep different versions for testing/development, and the version that will actually be used in training. This adds in maintenance, etc.

many thanks!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: "collective_broadcast" for CPU PJRT #33502

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: "collective_broadcast" for CPU PJRT #33502

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions