Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Add TrainJob conditions #2322

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Nov 7, 2024

What this PR does / why we need it:
I implemented the TrainJob condition mechanism based on https://github.com/kubeflow/training-operator/tree/master/docs/proposals/2170-kubeflow-training-v2#state-transition

However, the current implementation depends on the JobSet status.conditions opposed to the status.terminalState since the terminalState was introduced in JobSet v0.6, then the JobSet depends on the K8s lib 1.30 in #2299.

So, after we upgrade the K8s libs to 1.30, we can revisit the JobSet status.terminalState.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Part-of: #2207
Relates to #2170

Checklist:

  • Docs included if any changes are user facing

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from tenzen-y. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Nov 7, 2024
@tenzen-y tenzen-y force-pushed the v2-add-reconcile-condition branch 2 times, most recently from 8c638b4 to fd61ff2 Compare November 7, 2024 20:45
@tenzen-y tenzen-y marked this pull request as ready for review November 7, 2024 20:46
@@ -51,7 +51,7 @@ type Framework struct {
}

func (f *Framework) Init() *rest.Config {
log.SetLogger(zap.New(zap.WriteTo(ginkgo.GinkgoWriter), zap.UseDevMode(true)))
ctrl.SetLogger(zap.New(zap.WriteTo(ginkgo.GinkgoWriter), zap.Level(zapcore.Level(-5)), zap.UseDevMode(true)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surfacing the debug logs.

@coveralls
Copy link

coveralls commented Nov 7, 2024

Pull Request Test Coverage Report for Build 11732628418

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 11663764609: 0.0%
Covered Lines: 77
Relevant Lines: 77

💛 - Coveralls

@tenzen-y tenzen-y force-pushed the v2-add-reconcile-condition branch 2 times, most recently from 4bb08b7 to 12514ab Compare November 7, 2024 20:52
@tenzen-y tenzen-y changed the title WIP: KEP-2170: Add TrainJob conditions KEP-2170: Add TrainJob conditions Nov 7, 2024
@tenzen-y
Copy link
Member Author

tenzen-y commented Nov 7, 2024

/hold for review

@tenzen-y
Copy link
Member Author

tenzen-y commented Nov 7, 2024

/assign @kubeflow/wg-training-leads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants