Skip to content

Cost based partitioning across CPU + GPU + custom Accelarator using PJRT #32677

@milinbhade1214

Description

@milinbhade1214

Hi — I’m developing a PJRT plugin for a custom accelerator and want to enable availability-aware, cost-driven partitioning of an XLA/HLO module across GPU, CPU, and the accelerator:

If only CPU + accelerator are available, run using those.

If GPU is present and used, automatically identify HLO subgraphs that are better offloaded to the accelerator and compile/run them there.

Questions:

  1. Does XLA currently support multi-backend HLO partitioning/placement (i.e., splitting one HLO module across different backend types)?

  2. Can a PJRT plugin expose device cost/constraints or otherwise influence partitioning during HLO-level compilation?

  3. If not, what’s the recommended approach: implement an XLA pass to consume cost info, or build an orchestration layer that partitions the model and invokes multiple PJRT clients/executables? Which option is more realistic today?

I can prototype either an XLA/HLO pass or an external orchestrator — looking for pointers, existing examples, or caveats.

Thanks.

Metadata

Metadata

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions