Cost based partitioning across CPU + GPU + custom Accelarator using PJRT

Hi — I’m developing a PJRT plugin for a custom accelerator and want to enable availability-aware, cost-driven partitioning of an XLA/HLO module across GPU, CPU, and the accelerator:

If only CPU + accelerator are available, run using those.

If GPU is present and used, automatically identify HLO subgraphs that are better offloaded to the accelerator and compile/run them there.


Questions:

1. Does XLA currently support multi-backend HLO partitioning/placement (i.e., splitting one HLO module across different backend types)?


2. Can a PJRT plugin expose device cost/constraints or otherwise influence partitioning during HLO-level compilation?


3. If not, what’s the recommended approach: implement an XLA pass to consume cost info, or build an orchestration layer that partitions the model and invokes multiple PJRT clients/executables? Which option is more realistic today?



I can prototype either an XLA/HLO pass or an external orchestrator — looking for pointers, existing examples, or caveats.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cost based partitioning across CPU + GPU + custom Accelarator using PJRT #32677

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cost based partitioning across CPU + GPU + custom Accelarator using PJRT #32677

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions