Consider supporting SuccessPolicy and FailurePolicy #99

terrytangyuan · 2020-06-17T14:36:58Z

We recently added SuccessPolicy in tf-operator kubeflow/training-operator#1165 and are considering adding FailurePolicy to handle the case of failure in kubeflow/training-operator#1170. Once it's mature and if we see a common pattern in other operators, we should consider moving that to kubeflow/common.

cc @gaocegege @Jeffwan @johnugeorge @ChanYiLin @pingsutw

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2020-06-17T14:37:06Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
kind/feature	0.77
area/operator	0.85

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

kf-label-bot-dev · 2020-06-17T14:37:09Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
feature	0.77

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

Jeffwan · 2020-06-22T16:59:54Z

Having success/failure would be great which would be easier for different frameworks to handle errors and it help make reconciler logic extensible.

zw0610 · 2020-08-11T07:19:57Z

With fault-tolerant & elastic distributed training propagating among more frameworks, a universal definition of failure and success for a distributed training job shall benefit developers for clarifying logic when handling pods failed or recently joined.

Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Alexander Graf <[email protected]>

issue-label-bot bot added area/operator kind/feature labels Jun 17, 2020

kf-label-bot-dev bot added the feature label Jun 17, 2020

jlewi removed the feature label Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider supporting SuccessPolicy and FailurePolicy #99

Consider supporting SuccessPolicy and FailurePolicy #99

terrytangyuan commented Jun 17, 2020

issue-label-bot bot commented Jun 17, 2020

kf-label-bot-dev bot commented Jun 17, 2020

Jeffwan commented Jun 22, 2020

zw0610 commented Aug 11, 2020

Consider supporting SuccessPolicy and FailurePolicy #99

Consider supporting SuccessPolicy and FailurePolicy #99

Comments

terrytangyuan commented Jun 17, 2020

issue-label-bot bot commented Jun 17, 2020

kf-label-bot-dev bot commented Jun 17, 2020

Jeffwan commented Jun 22, 2020

zw0610 commented Aug 11, 2020