fix: Enforce single ML policy constraint with CEL validation for Torch, MPI, and JAX by Krishna-kg732 · Pull Request #3225 · kubeflow/trainer

Krishna-kg732 · 2026-02-19T04:37:52Z

What this PR solves

This PR fixes a validation issue where multiple ML runtime policies (Torch, MPI, JAX) could be configured simultaneously in a TrainingRuntime, leading to conflicting runtime configurations.

The previous validation logic : !(has(self.torch) && has(self.mpi)) which only prevented Torch and MPI from being set together, but didn't account for:

JAX runtime policy
Scenarios where all three policies could be partially configured
Future extensibility for additional runtime policies

This allowed invalid configurations where users could set multiple incompatible runtime policies.

Solution

Added comprehensive CEL validation: Updated the validation rule to [has(self.torch), has(self.mpi), has(self.jax)].filter(x, x).size() <= 1 which:
- Creates a list of boolean values for each policy field
- Filters for truthy values (policies that are set)
- Ensures at most one policy is configured
Updated PlainML plugin: Modified the EnforceMLPolicy function to check for JAX policy alongside Torch and MPI, ensuring PlainML only applies when no other runtime policy is active

Testing

Validation occurs at the CRD level via CEL expressions
Runtime enforcement in PlainML plugin ensures correct fallback behavior

mentioned in PR#3200

github-actions · 2026-02-19T04:38:00Z

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Slack: Join our #kubeflow-trainer Slack channel.
Meetings: Attend the Kubeflow AutoML and Training Working Group bi-weekly meetings.

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copilot

Pull request overview

This PR tightens TrainingRuntime ML policy validation to prevent configuring multiple incompatible runtime policies (Torch/MPI/JAX) at the same time, and aligns PlainML fallback behavior with that constraint.

Changes:

Updated CRD CEL validation to enforce “at most one of torch/mpi/jax is set”.
Updated PlainML’s EnforceMLPolicy to no-op when a JAX policy is configured (matching existing Torch/MPI behavior).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`pkg/runtime/framework/plugins/plainml/plainml.go`	Extends PlainML’s fallback guard to treat JAX as an explicitly selected runtime policy (so PlainML won’t apply).
`pkg/apis/trainer/v1alpha1/trainingruntime_types.go`	Replaces pairwise Torch/MPI exclusion with a single CEL rule that limits the number of configured ML policies to 1 across Torch/MPI/JAX.

pkg/runtime/framework/plugins/plainml/plainml.go

pkg/apis/trainer/v1alpha1/trainingruntime_types.go

andreyvelich

Thank you @Krishna-kg732!
/lgtm
/approve

astefanutti

Thanks @Krishna-kg732!

/lgtm

pkg/apis/trainer/v1alpha1/trainingruntime_types.go

astefanutti · 2026-02-23T08:44:53Z

/retest

andreyvelich · 2026-02-23T15:05:52Z

@Krishna-kg732 Please rebase your PR.

andreyvelich · 2026-02-23T20:05:42Z

One more rebase is needed @Krishna-kg732.

charts/kubeflow-trainer/Chart.yaml

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

andreyvelich

Thank you for this @Krishna-kg732!
/lgtm
/approve

google-oss-prow · 2026-02-24T18:29:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [andreyvelich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…h, MPI, and JAX (kubeflow#3225) * fix: enforce single ML policy constraint with CEL validation Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com> * added plainML fallback test case Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com> * added autogenerated files Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com> * added integration tests Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com> * bumped the version in charts to fix ci Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com> * added autogenerated file Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com> * chore: bump version Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com> --------- Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

Copilot AI review requested due to automatic review settings February 19, 2026 04:37

google-oss-prow bot requested review from akshaychitneni and jinchihe February 19, 2026 04:37

google-oss-prow bot added the size/XS label Feb 19, 2026

Copilot started reviewing on behalf of Krishna-kg732 February 19, 2026 04:38 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

pkg/runtime/framework/plugins/plainml/plainml.go Show resolved Hide resolved

pkg/apis/trainer/v1alpha1/trainingruntime_types.go Show resolved Hide resolved

google-oss-prow bot added size/S and removed size/XS labels Feb 19, 2026

Krishna-kg732 changed the title ~~fix(JAX): Enforce single ML policy constraint with CEL validation for Torch, MPI, and JAX~~ fix: Enforce single ML policy constraint with CEL validation for Torch, MPI, and JAX Feb 19, 2026

andreyvelich reviewed Feb 19, 2026

View reviewed changes

pkg/apis/trainer/v1alpha1/trainingruntime_types.go Show resolved Hide resolved

google-oss-prow bot added size/M and removed size/S labels Feb 19, 2026

Krishna-kg732 force-pushed the fix/jax-validation branch from aa6bd0c to c22b56c Compare February 19, 2026 15:59

andreyvelich reviewed Feb 21, 2026

View reviewed changes

pkg/apis/trainer/v1alpha1/trainingruntime_types.go Show resolved Hide resolved

andreyvelich reviewed Feb 22, 2026

View reviewed changes

google-oss-prow bot assigned andreyvelich Feb 22, 2026

google-oss-prow bot added lgtm approved labels Feb 22, 2026

akshaychitneni approved these changes Feb 23, 2026

View reviewed changes

google-oss-prow bot assigned akshaychitneni Feb 23, 2026

google-oss-prow bot removed the lgtm label Feb 23, 2026

Krishna-kg732 force-pushed the fix/jax-validation branch from ae0df7a to 3ff5704 Compare February 23, 2026 08:25

astefanutti reviewed Feb 23, 2026

View reviewed changes

pkg/apis/trainer/v1alpha1/trainingruntime_types.go Show resolved Hide resolved

google-oss-prow bot assigned astefanutti Feb 23, 2026

google-oss-prow bot added the lgtm label Feb 23, 2026

google-oss-prow bot removed the lgtm label Feb 23, 2026

Krishna-kg732 force-pushed the fix/jax-validation branch from f61cb55 to 19c1a86 Compare February 23, 2026 08:52

Krishna-kg732 force-pushed the fix/jax-validation branch from 19c1a86 to af7a30e Compare February 23, 2026 18:12

andreyvelich reviewed Feb 23, 2026

View reviewed changes

charts/kubeflow-trainer/Chart.yaml Outdated Show resolved Hide resolved

Krishna-kg732 added 7 commits February 24, 2026 09:11

fix: enforce single ML policy constraint with CEL validation

2a54416

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

added plainML fallback test case

7222583

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

added autogenerated files

e270fb2

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

added integration tests

caae387

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

bumped the version in charts to fix ci

a92fa81

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

added autogenerated file

995816f

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

chore: bump version

6c0fda0

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>

Krishna-kg732 force-pushed the fix/jax-validation branch from af7a30e to 6c0fda0 Compare February 24, 2026 03:42

andreyvelich reviewed Feb 24, 2026

View reviewed changes

google-oss-prow bot added the lgtm label Feb 24, 2026

google-oss-prow bot merged commit 57b83c4 into kubeflow:master Feb 24, 2026
29 checks passed

google-oss-prow bot added this to the v2.2 milestone Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Enforce single ML policy constraint with CEL validation for Torch, MPI, and JAX#3225

fix: Enforce single ML policy constraint with CEL validation for Torch, MPI, and JAX#3225
google-oss-prow[bot] merged 7 commits intokubeflow:masterfrom
Krishna-kg732:fix/jax-validation

Krishna-kg732 commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreyvelich left a comment

Uh oh!

astefanutti left a comment

Uh oh!

Uh oh!

astefanutti commented Feb 23, 2026

Uh oh!

andreyvelich commented Feb 23, 2026

Uh oh!

andreyvelich commented Feb 23, 2026

Uh oh!

Uh oh!

andreyvelich left a comment

Uh oh!

google-oss-prow bot commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Krishna-kg732 commented Feb 19, 2026

What this PR solves

Solution

Testing

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

astefanutti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

astefanutti commented Feb 23, 2026

Uh oh!

andreyvelich commented Feb 23, 2026

Uh oh!

andreyvelich commented Feb 23, 2026

Uh oh!

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

google-oss-prow bot commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants